1 From the Neurovascular Service, Department of Neurology, Box 0114, University of California, San Francisco, 505 Parnassus Avenue, San Francisco, CA 94143-0114 (e-mail: clayj{at}itsa.ucsf.edu).
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
confounding factors (epidemiology); epidemiologic methods; intracranial aneurysm; risk assessment
Abbreviations: CI, confidence interval; OR, odds ratio.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Randomized trials are not affected by confounding by indication (1). Only patients who are candidates for all of the treatments being compared are entered into the study. In addition, prognostic factors do not influence treatment decisions, and even unknown factors should be balanced between treatment groups as long as the sample size is adequate.
To try to improve observational studies of treatment effects, my colleagues and I developed a design termed "blinded prospective review." In this design, key patient information up to the point of treatment is presented to practitioners of the therapies being compared. The practitioners are asked to determine whether their particular type of therapy is indicated and to rate the overall prognosis for the patient. Treatment risks and outcomes are compared for cases in which both therapies are judged to be indicated. In this way, only candidates for both therapies are compared, which is similar to a criterion for inclusion in a randomized trial (3, 7
). Furthermore, practitioners are asked to commit to an estimation of patient prognosis prior to treatment; these risk assessments can be compared between therapies, potentially responding to concerns about unmeasured confounders and addressing the objection "...but we treated higher-risk cases than they did."
The method is demonstrated here through a comparison of two different treatments for unruptured cerebral aneurysm. Cerebral aneurysm rupture is an important cause of stroke (8), with case fatality rates approaching 50 percent (9
). Some aneurysms are discovered before they rupture, either because they produce symptoms, such as new headaches or cranial neuropathies, or because they appear as incidental findings in imaging studies (10
). To prevent the devastating effects of rupture, neurosurgeons have treated these unruptured aneurysms by placing a clip over the neck of the aneurysm to isolate it from the circulation, and this has become the standard of care (10
, 11
).
In the early 1990s, endovascular therapy with platinum coils, performed by interventional radiologists, was introduced as an alternative to surgical clipping (12, 13
). In this procedure, platinum coils are packed into an aneurysm through an endovascular microcatheter, excluding the aneurysm from the circulation. An increasing number of medical centers are treating ruptured and unruptured aneurysms with endovascular coil embolization (14
, 15
).
Current data on these therapies are largely limited to case series. The reported risks of endovascular and surgical treatment of unruptured aneurysms have been similar, with 2 percent mortality and 510 percent morbidity (1620
). However, these observational studies have not compared therapies directly, definitions of treatment indications and outcomes have varied, and confounding by indication is probable. In addition, risk factors for poor outcomes have not been clearly defined. Patient age and prior morbidity from the aneurysm and other medical problems are generally considered important risk factors for surgery (21
). In addition, aneurysm characteristics such as overall diameter, neck diameter, and location are felt to contribute to treatment risk. The location of the aneurysm may affect the risks of surgery and endovascular therapy differently, because aneurysms near the surface of the brain, such as those in the middle cerebral and anterior cerebral arteries, are easier to approach by surgery and more difficult to reach by endovascular therapy; the opposite is true for deeper aneurysms, such as those in the posterior circulation and the cavernous segment of the internal carotid (14
, 21
).
The University of California, San Francisco, is a major referral center for cerebral aneurysms, and nearly half of its patients are treated with endovascular techniques. The service initially consulted, neurosurgery or interventional radiology, generally treats the patient. Specific case characteristics are less important in treatment decisions, which provides researchers with an opportunity to compare procedures if differences in preprocedural risks can be identified and controlled.
My colleagues and I studied a cohort of patients treated for unruptured aneurysm at this institution after the introduction of coil embolization in 1990. The clinical implications of short- and long-term outcomes were discussed previously (22). In this article, the validity and utility of the blinded prospective review design are evaluated.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Blinded review
Independent reviewers served on two specialist panels; the three neurosurgeons and three interventional radiologists with the highest volumes of treated unruptured cerebral aneurysms agreed to participate. We attempted to reproduce the usual referral process, in which key radiographs and a letter describing the patient are reviewed for consideration of treatment. Each physician, blinded with regard to treatment modality and outcome, reviewed the following information: 1) patient age; 2) symptoms, signs, and past medical history upon presentation, based on abstraction of medical records; and 3) relevant preprocedural radiographic images. The physicians estimated that cases required approximately 2 minutes each to review. Physicians who were familiar with the treatment applied in a given case were excluded from its review. For testing of intrarater reliability, 11 randomly selected cases were presented twice, separated by more than 90 days.
Inclusion
We were interested in comparing outcomes only among those cases in which the patient could have received either procedure, since candidates for only one of the procedures would not require making a decision between therapies and would not have been entered into a randomized trial. Indications for the different therapies have not been formalized, so we incorporated an inclusion question into the blinded review of cases. Each reviewing physician was asked to consider the question, "If my procedure were the only option available for treatment, would the risks of not treating justify proceeding with this procedure?" In this way, opinions about whether a patient was a candidate for each procedure were gathered from the physicians who performed the procedure using the same information normally available in clinical practice. The inclusion question was meant to mimic selection of patients for a randomized trial. In this instance, the practitioner must determine whether the patient would be a candidate for his/her procedure if randomization led to the case's selection. The practitioner cannot choose an alternative therapy for the patient.
To determine the reliability of physicians' determinations of inclusion, we compared intrarater and interrater responses using kappa statistics (23). Interrater responses were performed separately for each specialty, since indications for the two procedures were likely to differ and there was no reason to expect a neurosurgeon and an interventional radiologist to agree about the appropriateness of treatment. To determine which patient and aneurysm characteristics predicted a rating that treatment was justified, we used logistic regression with stepwise elimination of variables not contributing to the model (defined as p > 0.10).
Risk assessment
Systematic differences in preprocedural risk between treatment groups could persist even after adjustment for known prognostic variables. Therefore, we incorporated a question about the overall preprocedural risk for a case in the blinded review. Each physician rated anticipated procedure risk on a four-point Likert scale, as slight (<1 percent morbidity), low (15 percent morbidity), moderate (510 percent morbidity), or high (>10 percent morbidity). Factors responsible for elevated risk in a given case were chosen from a list or written.
To test reproducibility, we compared physician-anticipated risk assessments using linearly weighted kappa () statistics (23
) and Spearman rank correlation coefficients (Rs), since risk assessments were ordinal (24
). During initial testing, we found that physicians disagreed about the median risk but generally agreed about the relative risk of cases, which indicated an error in calibration between physicians. This could have produced unbalanced overall risk assessments, since those physicians limiting responses to the top or bottom of the Likert scale would contribute less at a mean assessment than those using its full range. To eliminate calibration differences and give each physician equal weight in the mean risk assessments, we transformed each physician's ratings to z scores so that the mean response for each reviewer was 0 with a standard deviation of 1 (24
). For risk assessments, we calculated intraclass correlation coefficients for each specialty to determine the portion of the total variance explained by differences between patients rather than between physician assessments for a given patient. For each patient, z scores of a given specialty were averaged so that each reviewer contributed equally to a specialty's risk assessment. To define the determinants of assessed risk, we analyzed these z scores as dependent variables in stepwise linear regression of case characteristics. Factors identified as important to determining risk were compared across specialty using chi-squared statistics.
Outcome
The occurrence of a procedure-related complication was the primary outcome for this analysis. Procedure-related complications were defined as those that were clearly consequences of the procedure and that resulted in prolonged hospitalization or a change in Rankin Scale (25) score of 1 or more points at discharge. This outcome was felt to be important, because it reflected morbidity and resource consumption. Complications were determined independently by two neurologists who were blinded as to treatment modality, and the determinations included the entire course of treatment (22
). Agreement on whether a procedure-related complication occurred was "substantial" (23
) (
= 0.89), and disagreements were resolved by discussion between the reviewing neurologists.
It was important to determine whether the risk assessments provided important information not present in multivariable models that included measured case characteristics. The relative contributions of risk assessments to the overall fit of multivariable models of complication risk were evaluated using logistic regression for each procedure. All candidate predictors were entered into the model, and then individual predictors were removed; the fits of these models were compared using the deviance, defined as the difference in 2(log likelihood) (26, 27
). This approach provides a unitless measure of the contribution of a given variable to the fit of a complete multivariable model. When deviance is calculated for a model with a single variable removed, the significant contribution of that variable to the model can be tested using the chi-squared test with 1 df.
In evaluating the relative safety of the procedures, it was necessary to form a final cohort of patients who could have received either therapy. To do this, we used the inclusion question from the blinded review and excluded cases for which the majority of practitioners in either specialty stated that treatment was not indicated. That is, if two out of three neurosurgeons or interventional radiologists judged that treatment of a patient was not justified, that case was eliminated from the final cohort. Patient characteristics were compared before and after exclusion using Student's t test for continuous variables and Fisher's exact test for categorical variables. In the final cohort, complications of the two procedures were compared.
We used logistic regression analysis to determine how the risk assessment contributed to comparison of risks with surgery or endovascular therapy. Odds ratios for risk of complications with surgery as compared with endovascular therapy were determined on the basis of models including patient characteristics, models including risk assessments, and models including both. In this way, we could evaluate whether the risk assessment variable suggested that a bias was present in less complete models.
Since risk was likely to be determined differently by neurosurgeons and interventional radiologists, interactions between risk and procedure were likely. That is, surgery might be safer in the group of patients with the lowest risk assessment according to the neurosurgeons and the highest risk according to interventional radiologists. To evaluate this possibility, we divided risk assessments for each specialty into high- and low-risk groups at the mean values. Complications were assessed for all combinations of level of risk. Heterogeneity was tested and odds ratios were combined using Mantel-Haenszel methods (28).
The Stata statistical package (version 5.0; Stata Corporation, College Station, Texas) was used for all analyses.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Inclusion
There was fair agreement (23) between duplicate inclusion assessments made by the same reviewers more than 3 months apart (n = 11; 86 percent agreement;
= 0.29). Most of the disagreement was produced by a single practitioner on his first day, and reproducibility was substantial when he was eliminated (95 percent agreement;
= 0.64). Treatment was considered justified in 5790 percent of the cases by neurosurgeons and in 8196 percent by interventional radiologists. Agreement was fair between reviewers in a specialty (7388 percent agreement;
= 0.270.48). The blinded review excluded 49 cases (figure 1), leaving 130 cases in which patients were judged to be candidates for either therapy. Four cases were eliminated by the specialists who had performed the treatment. None of the intrarater disagreements affected inclusion of a case in the final cohort, because other practitioners were unanimous in their decisions, so recommendation of inclusion by the panels was completely reproducible (
= 1.00).
|
After exclusion of cases in which treatment was considered unjustified by practitioners of either specialty, treatment groups were generally more similar (table 1).
|
|
Risk and outcome
For each procedure, higher preprocedural risk assessments were associated with greater risk of complications in unadjusted analyses (risk of a procedural complication per standard deviation of assessed risk: for surgical cases, odds ratio (OR) = 3.2, 95 percent confidence interval (CI): 1.6, 6.4; p = 0.001; for endovascular cases, OR = 2.3, 95 percent CI: 1.1, 4.8; p = 0.03). Risk assessments significantly contributed to multivariable models adjusted for case characteristics (table 3). Furthermore, the relative impact of risk assessment on the fit of the complete multivariable models was high, ranking first for surgery cases and second for endovascular cases. Predictors of complications differed for surgery and endovascular therapy (table 3).
|
Treatment and outcome
Neurosurgeons rated the group of patients treated by endovascular therapy as higher-risk than the surgery group (table 1). Conversely, interventional radiologists rated the surgical cases as higher-risk. When risk assessments were combined across specialties, overall risk assessments were the same in the two groups.
A complication occurred in 46 percent of surgical cases and 23 percent of endovascular therapy cases. Logistic regression models revealed greater disparity in risk of a complication after adjustment for case characteristics alone, after adjustment for overall risk assessments alone, and in a full model with risk assessments and case characteristics included (table 4).
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Choosing the best treatment is only important for patients who are candidates for more than one therapy. Therefore, observational studies of treatment effects should attempt to define a cohort of patients who could have received any of the compared therapies, as is required for inclusion in randomized trials (7). However, there is often disagreement between physicians regarding perceived indications for treatment, which is evident in the practice of obtaining second opinions.
Blinded prospective review offers a method of evaluating perceived indications for treatment without bias due to hindsight about a specific treatment decision or outcome. We found "fair" agreement between intrarater assessments separated by more than 3 months, and agreement improved dramatically when a single misguided reviewer was removed from the analysis. As expected, there was variability between practitioner pairs in determination of whether treatment was justified, with kappa values of 0.270.48 between practitioners in a specialty. The neurosurgeon with the lowest number of treated cases most frequently rated treatment as not indicated, as would be anticipated for a practitioner who is less comfortable with a procedure. To reduce the effect of an individual physician's input in a medical system that often includes second opinions, we used the majority assessment of panels of three practitioners to determine whether a case would be included in the final cohort. This system produced perfect agreement in test-retest analysis, confirming it as a reproducible approach to defining a cohort of patients who could have received either therapy.
It is impossible to determine whether treatment in a given case is justified, since procedural complications will vary between practitioners within a specialty and because the risks of not treating are often poorly understood. Therefore, it is impossible to directly validate a decision to recommend treatment. However, several findings are reassuring that inclusion decisions were valid in this study. First, very few of the cases eliminated were excluded by the specialists who had performed the treatment (figure 1). Second, physicians' determinations were related to a number of perceived and confirmed risk factors for poor outcomes. Third, risk factors generally differed less between treatment groups after the exclusion process (table 1). Together, these findings argue for the validity of the exclusion component of the blinded prospective review. Panels of practitioners who are blinded with regard to treatment modality and outcome can reproducibly create a cohort of patients judged to be candidates for all treatments being evaluated.
Assurance that members of treatment groups could have received either therapy does not guarantee that preprocedure prognosis and case complexity are similar in the treatment groups. Prognostication is the role of the treating physician, in a process involving synthesis of a large number of clinical variables, study results, and experience. Physicians are often suspicious, perhaps correctly, that any combination of factors in a multivariable model is adequate to control for preprocedure prognosis (1). Practitioners' concerns about residual uncontrolled confounding often cannot be addressed in observational studies, and this may reduce the impact of the findings.
Defining practitioners' perceptions of preprocedural risk allows incorporation of the complex, ill-defined prognostication process into a study. Without knowledge of case treatment and outcome, practitioners commit to an assessment of preprocedural riskthe latent variable researchers attempt to model with multivariable statistics. In this demonstration, we found physicians' assessments of preprocedural risk to be highly reproducible upon test-retest analysis (weighted = 0.88, n = 11) and highly correlated between physicians in a specialty (for endovascular cases, R = 0.90; for surgery, R = 0.87). These results suggest that reproducible assessment of preprocedural risk is possible.
To determine whether risk assessments were valid representations of patient prognosis, we tested risk assessments as predictors of procedural complications. Overall risk assessments for each specialty were generated by combining risk assessments within each specialty after normalization. Neurosurgeons predicted complications of surgery and interventional radiologists predicted complications of endovascular therapy, which suggests that the risk assessments were meaningful.
Next we sought to determine whether physician risk assessment contributed information that was unavailable in a multivariable model of case characteristics. For both procedures, multivariable models that included all case characteristics were significantly improved by the addition of risk assessment variables, and risk assessments ranked first and second in impact on the overall models (table 3). Thus, the risk assessments provided important information that was missing from the collection of case characteristics.
Finally, it was important to determine whether the risk assessment process enhanced the validity of the study. In this demonstration, the crude risk of complications was much greater in the group treated surgically, and the difference persisted after adjustment for case characteristics (table 4). Residual confounding by indication could account for the outcome difference even after adjustment for case characteristics, since an important prognostic factor may not have been included. However, overall risk assessments were identical in the two treatment groups, and surgeons rated the group treated by endovascular therapy as higher-risk (table 1). Therefore, it would be difficult for a neurosurgeon to argue that he or she has treated the higher-risk cases: He/she has indicated otherwise. The possibility that a prognostic factor is unbalanced between groups cannot be ruled out, but this factor would need to be unknown or underappreciated by practitioners of the compared therapies. Knowledge of a prognostic factor is required for confounding by indication to occur, since only then can the prognostic factor affect treatment decisions (1). Therefore, unknown prognostic factors may be confounders, but they do not produce confounding by indication, the real bane of observational treatment studies.
Even though endovascular therapy may be safer overall, it is possible that some subgroups of patients would be treated more safely with surgery. Specifically, cases with high risk for endovascular therapy and low risk for surgery may be best treated by surgery. Furthermore, risk was assessed differently for the two procedures, and prognostic factors also differed, so selection could potentially improve care even further than treating all patients with endovascular therapy. The risk assessment process allows evaluation of this possibility. Patients were stratified into high- and low-risk groups for each procedure (figure 2). We found that complications were more frequent for surgery in all strata, including the one with high endovascular risk and low surgical risk. Therefore, the finding of greater safety for endovascular therapy was robust across risk groups.
This study design will not be useful for all therapies. For the design to control confounding by indication, information used to make treatment decisions and to prognosticate must be accurately recreated and presented to practitioners as though the case were being treated prospectively. If important information is available to the treating physician but not to the reviewer, inadequate estimation of the preprocedural risk may result, and confounding by indication may remain in the final analysis. The design was practical in our study because abstracted patient information and radiographic images are routinely the only information used to identify candidates for the compared therapies. If a patient examination had been part of the preprocedural decision-making, it might not have been possible to accurately reproduce the clinical pretreatment scenario, since an examination may identify factors that are not readily quantifiable. This limits the usefulness of the design to those situations in which the important clinical variables can be accurately recreated and presented to the reviewers.
Generalization outside the study center may be affected by variability in case selection and risk assessment. However, this is not a problem unique to this study design. Certainly, procedural skills also vary between institutions. Integrating knowledge about case selection and perceived risk clarifies a center's practice pattern. This may enhance generalizability by allowing readers to compare their case selection with that in the study.
Randomized controlled trials provide the only reliable method with which to assure validity in studies of treatment effects (1, 2
). However, randomized trials are not always possible because of cost, practicalities, and ethical concerns (3
, 7
, 29
). Therefore, reliable observational methods are required. Blinded prospective review may offer a method for improving observational studies of treatment effects by identifying a cohort of patients who could have received either of the compared therapies, and by incorporating practitioners' assessments of preprocedural risk into the analysis. In this way, confounding by indication can be identified and controlled more completely than in traditional, retrospective study designs.
![]() |
ACKNOWLEDGMENTS |
---|
The author thanks Drs. Ira Tager and Alan Hubbard for assistance with design development and for important editorial comments.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|