Identifying Confounding by Indication through Blinded Prospective Review

S. Claiborne Johnston1

1 From the Neurovascular Service, Department of Neurology, Box 0114, University of California, San Francisco, 505 Parnassus Avenue, San Francisco, CA 94143-0114 (e-mail: clayj{at}itsa.ucsf.edu).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Confounding by indication is a relentless threat to validity in observational studies of treatment effects. Multivariable models allow adjustment for known and readily measurable prognostic factors, but they may incompletely or inaccurately represent the underlying overall perceived risk of treatment. To incorporate practitioners' judgments about treatment indication and preprocedural prognosis into an observational study of cerebral aneurysm treatments, the author and colleagues presented patient characteristics and radiographic images from 179 aneurysm cases (University of California, San Francisco, 1990–1997) to panels of practitioners who were blinded as to actual treatment selection and outcome. In this way, the review process was designed to recreate the presentation of information in a prospective study. Judgments about inclusion and prognosis were reproducible. Perceived prognosis correlated with complication rates and provided information not present in a multivariable model including all available clinical characteristics. The association between treatment modality and outcome was examined while stratifying and adjusting for differences in perceived prognosis. Blinded prospective review may provide an unbiased observational study design with which to define a cohort that could have received any of the treatments being compared and to measure and adjust for overall perceived procedural risk.

confounding factors (epidemiology); epidemiologic methods; intracranial aneurysm; risk assessment

Abbreviations: CI, confidence interval; OR, odds ratio.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Evaluating treatment effects from observational data is problematic. Prognostic factors may influence treatment decisions, producing a type of bias referred to as "confounding by indication" (1GoGoGoGo–5Go). Controlling for known prognostic factors may reduce this problem (6Go), but it is always possible that a forgotten or unknown factor was not included or that factors interact complexly (7Go). Confounding by indication has been described as the most important limitation of observational studies of treatment effects (2Go).

Randomized trials are not affected by confounding by indication (1Go). Only patients who are candidates for all of the treatments being compared are entered into the study. In addition, prognostic factors do not influence treatment decisions, and even unknown factors should be balanced between treatment groups as long as the sample size is adequate.

To try to improve observational studies of treatment effects, my colleagues and I developed a design termed "blinded prospective review." In this design, key patient information up to the point of treatment is presented to practitioners of the therapies being compared. The practitioners are asked to determine whether their particular type of therapy is indicated and to rate the overall prognosis for the patient. Treatment risks and outcomes are compared for cases in which both therapies are judged to be indicated. In this way, only candidates for both therapies are compared, which is similar to a criterion for inclusion in a randomized trial (3Go, 7Go). Furthermore, practitioners are asked to commit to an estimation of patient prognosis prior to treatment; these risk assessments can be compared between therapies, potentially responding to concerns about unmeasured confounders and addressing the objection "...but we treated higher-risk cases than they did."

The method is demonstrated here through a comparison of two different treatments for unruptured cerebral aneurysm. Cerebral aneurysm rupture is an important cause of stroke (8Go), with case fatality rates approaching 50 percent (9Go). Some aneurysms are discovered before they rupture, either because they produce symptoms, such as new headaches or cranial neuropathies, or because they appear as incidental findings in imaging studies (10Go). To prevent the devastating effects of rupture, neurosurgeons have treated these unruptured aneurysms by placing a clip over the neck of the aneurysm to isolate it from the circulation, and this has become the standard of care (10Go, 11Go).

In the early 1990s, endovascular therapy with platinum coils, performed by interventional radiologists, was introduced as an alternative to surgical clipping (12Go, 13Go). In this procedure, platinum coils are packed into an aneurysm through an endovascular microcatheter, excluding the aneurysm from the circulation. An increasing number of medical centers are treating ruptured and unruptured aneurysms with endovascular coil embolization (14Go, 15Go).

Current data on these therapies are largely limited to case series. The reported risks of endovascular and surgical treatment of unruptured aneurysms have been similar, with 2 percent mortality and 5–10 percent morbidity (16GoGoGoGo–20Go). However, these observational studies have not compared therapies directly, definitions of treatment indications and outcomes have varied, and confounding by indication is probable. In addition, risk factors for poor outcomes have not been clearly defined. Patient age and prior morbidity from the aneurysm and other medical problems are generally considered important risk factors for surgery (21Go). In addition, aneurysm characteristics such as overall diameter, neck diameter, and location are felt to contribute to treatment risk. The location of the aneurysm may affect the risks of surgery and endovascular therapy differently, because aneurysms near the surface of the brain, such as those in the middle cerebral and anterior cerebral arteries, are easier to approach by surgery and more difficult to reach by endovascular therapy; the opposite is true for deeper aneurysms, such as those in the posterior circulation and the cavernous segment of the internal carotid (14Go, 21Go).

The University of California, San Francisco, is a major referral center for cerebral aneurysms, and nearly half of its patients are treated with endovascular techniques. The service initially consulted, neurosurgery or interventional radiology, generally treats the patient. Specific case characteristics are less important in treatment decisions, which provides researchers with an opportunity to compare procedures if differences in preprocedural risks can be identified and controlled.

My colleagues and I studied a cohort of patients treated for unruptured aneurysm at this institution after the introduction of coil embolization in 1990. The clinical implications of short- and long-term outcomes were discussed previously (22Go). In this article, the validity and utility of the blinded prospective review design are evaluated.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Case ascertainment
Treated cases of unruptured aneurysm were identified from the hospital discharge database at the medical center of the University of California, San Francisco, as previously described (22Go). The following initial inclusion criteria were applied: attempted surgical or endovascular treatment of an unruptured aneurysm (craniotomy performed or coil deployed), age >=18 years at follow-up, no associated arteriovenous malformation, no subarachnoid hemorrhage from a different aneurysm within 6 months before treatment, and no second aneurysm treated during a different surgical procedure within 2 months. For cases in which both types of treatments were tried, treatment modality was classified as the first attempted therapy, analogously to an intention-to-treat analysis.

Blinded review
Independent reviewers served on two specialist panels; the three neurosurgeons and three interventional radiologists with the highest volumes of treated unruptured cerebral aneurysms agreed to participate. We attempted to reproduce the usual referral process, in which key radiographs and a letter describing the patient are reviewed for consideration of treatment. Each physician, blinded with regard to treatment modality and outcome, reviewed the following information: 1) patient age; 2) symptoms, signs, and past medical history upon presentation, based on abstraction of medical records; and 3) relevant preprocedural radiographic images. The physicians estimated that cases required approximately 2 minutes each to review. Physicians who were familiar with the treatment applied in a given case were excluded from its review. For testing of intrarater reliability, 11 randomly selected cases were presented twice, separated by more than 90 days.

Inclusion
We were interested in comparing outcomes only among those cases in which the patient could have received either procedure, since candidates for only one of the procedures would not require making a decision between therapies and would not have been entered into a randomized trial. Indications for the different therapies have not been formalized, so we incorporated an inclusion question into the blinded review of cases. Each reviewing physician was asked to consider the question, "If my procedure were the only option available for treatment, would the risks of not treating justify proceeding with this procedure?" In this way, opinions about whether a patient was a candidate for each procedure were gathered from the physicians who performed the procedure using the same information normally available in clinical practice. The inclusion question was meant to mimic selection of patients for a randomized trial. In this instance, the practitioner must determine whether the patient would be a candidate for his/her procedure if randomization led to the case's selection. The practitioner cannot choose an alternative therapy for the patient.

To determine the reliability of physicians' determinations of inclusion, we compared intrarater and interrater responses using kappa statistics (23Go). Interrater responses were performed separately for each specialty, since indications for the two procedures were likely to differ and there was no reason to expect a neurosurgeon and an interventional radiologist to agree about the appropriateness of treatment. To determine which patient and aneurysm characteristics predicted a rating that treatment was justified, we used logistic regression with stepwise elimination of variables not contributing to the model (defined as p > 0.10).

Risk assessment
Systematic differences in preprocedural risk between treatment groups could persist even after adjustment for known prognostic variables. Therefore, we incorporated a question about the overall preprocedural risk for a case in the blinded review. Each physician rated anticipated procedure risk on a four-point Likert scale, as slight (<1 percent morbidity), low (1–5 percent morbidity), moderate (5–10 percent morbidity), or high (>10 percent morbidity). Factors responsible for elevated risk in a given case were chosen from a list or written.

To test reproducibility, we compared physician-anticipated risk assessments using linearly weighted kappa ({kappa}) statistics (23Go) and Spearman rank correlation coefficients (Rs), since risk assessments were ordinal (24Go). During initial testing, we found that physicians disagreed about the median risk but generally agreed about the relative risk of cases, which indicated an error in calibration between physicians. This could have produced unbalanced overall risk assessments, since those physicians limiting responses to the top or bottom of the Likert scale would contribute less at a mean assessment than those using its full range. To eliminate calibration differences and give each physician equal weight in the mean risk assessments, we transformed each physician's ratings to z scores so that the mean response for each reviewer was 0 with a standard deviation of 1 (24Go). For risk assessments, we calculated intraclass correlation coefficients for each specialty to determine the portion of the total variance explained by differences between patients rather than between physician assessments for a given patient. For each patient, z scores of a given specialty were averaged so that each reviewer contributed equally to a specialty's risk assessment. To define the determinants of assessed risk, we analyzed these z scores as dependent variables in stepwise linear regression of case characteristics. Factors identified as important to determining risk were compared across specialty using chi-squared statistics.

Outcome
The occurrence of a procedure-related complication was the primary outcome for this analysis. Procedure-related complications were defined as those that were clearly consequences of the procedure and that resulted in prolonged hospitalization or a change in Rankin Scale (25Go) score of 1 or more points at discharge. This outcome was felt to be important, because it reflected morbidity and resource consumption. Complications were determined independently by two neurologists who were blinded as to treatment modality, and the determinations included the entire course of treatment (22Go). Agreement on whether a procedure-related complication occurred was "substantial" (23Go) ({kappa} = 0.89), and disagreements were resolved by discussion between the reviewing neurologists.

It was important to determine whether the risk assessments provided important information not present in multivariable models that included measured case characteristics. The relative contributions of risk assessments to the overall fit of multivariable models of complication risk were evaluated using logistic regression for each procedure. All candidate predictors were entered into the model, and then individual predictors were removed; the fits of these models were compared using the deviance, defined as the difference in –2(log likelihood) (26Go, 27Go). This approach provides a unitless measure of the contribution of a given variable to the fit of a complete multivariable model. When deviance is calculated for a model with a single variable removed, the significant contribution of that variable to the model can be tested using the chi-squared test with 1 df.

In evaluating the relative safety of the procedures, it was necessary to form a final cohort of patients who could have received either therapy. To do this, we used the inclusion question from the blinded review and excluded cases for which the majority of practitioners in either specialty stated that treatment was not indicated. That is, if two out of three neurosurgeons or interventional radiologists judged that treatment of a patient was not justified, that case was eliminated from the final cohort. Patient characteristics were compared before and after exclusion using Student's t test for continuous variables and Fisher's exact test for categorical variables. In the final cohort, complications of the two procedures were compared.

We used logistic regression analysis to determine how the risk assessment contributed to comparison of risks with surgery or endovascular therapy. Odds ratios for risk of complications with surgery as compared with endovascular therapy were determined on the basis of models including patient characteristics, models including risk assessments, and models including both. In this way, we could evaluate whether the risk assessment variable suggested that a bias was present in less complete models.

Since risk was likely to be determined differently by neurosurgeons and interventional radiologists, interactions between risk and procedure were likely. That is, surgery might be safer in the group of patients with the lowest risk assessment according to the neurosurgeons and the highest risk according to interventional radiologists. To evaluate this possibility, we divided risk assessments for each specialty into high- and low-risk groups at the mean values. Complications were assessed for all combinations of level of risk. Heterogeneity was tested and odds ratios were combined using Mantel-Haenszel methods (28Go).

The Stata statistical package (version 5.0; Stata Corporation, College Station, Texas) was used for all analyses.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
There were 435 patients treated for unruptured aneurysm during the study period (January 1990–December 1997), of whom 216 (118 surgical, 98 endovascular) met initial inclusion criteria. Angiographic images were unavailable in 37 cases, leaving 179 cases for blinded review.

Inclusion
There was fair agreement (23Go) between duplicate inclusion assessments made by the same reviewers more than 3 months apart (n = 11; 86 percent agreement; {kappa} = 0.29). Most of the disagreement was produced by a single practitioner on his first day, and reproducibility was substantial when he was eliminated (95 percent agreement; {kappa} = 0.64). Treatment was considered justified in 57–90 percent of the cases by neurosurgeons and in 81–96 percent by interventional radiologists. Agreement was fair between reviewers in a specialty (73–88 percent agreement; {kappa} = 0.27–0.48). The blinded review excluded 49 cases (figure 1), leaving 130 cases in which patients were judged to be candidates for either therapy. Four cases were eliminated by the specialists who had performed the treatment. None of the intrarater disagreements affected inclusion of a case in the final cohort, because other practitioners were unanimous in their decisions, so recommendation of inclusion by the panels was completely reproducible ({kappa} = 1.00).



View larger version (21K):
[in this window]
[in a new window]
 
FIGURE 1. Results from a blinded review of 216 cases meeting initial inclusion criteria in a study of treatment for unruptured aneurysm, University of California, San Francisco, 1990–1997. In 68 surgical cases ("clipped") and 62 endovascular embolization cases ("coiled"), patients were considered treatable by either type of therapy upon blinded review and were included in the final analysis.

 
We used logistic regression to determine which patient and aneurysm factors were independently important to determining that treatment was indicated. Interventional radiologists were more likely to exclude patients who were older (p = 0.04), patients with aneurysms closer to the brain surface (middle cerebral artery, p = 0.001; anterior cerebral artery, p = 0.02), and patients with relatively large aneurysm necks (large ratio of neck to overall diameter, p < 0.001). Among neurosurgeons, patients with older age (p < 0.001), large aneurysm necks (p = 0.02), and aneurysms farther from the outer surface of the brain (cavernous segment of the internal carotid artery, p < 0.001) were more likely to be excluded.

After exclusion of cases in which treatment was considered unjustified by practitioners of either specialty, treatment groups were generally more similar (table 1).


View this table:
[in this window]
[in a new window]
 
TABLE 1. Characteristics of patients treated for unruptured aneurysm at the University of California, San Francisco, 1990–1997

 
Risk assessment
Neurosurgeons and interventional radiologists identified different patient and aneurysm characteristics as being important to elevated risk (table 2). Intrarater reliability in risk assessment was nearly perfect (97 percent agreement; weighted {kappa} = 0.88; Rs = 0.94; n = 11). There were differences in risk assessment between reviewers in a specialty, with poor-to-fair agreement (weighted {kappa} = 0.12–0.41). Interrater differences were largely due to miscalibration, since intraclass correlations were high (endovascular therapy, R = 0.90; surgery, R = 0.87). In retrospect, a continuous scale for estimating procedural risk with average risk in the middle of the scale would have provided more detailed information while avoiding ceiling and floor effects.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Factors stated as important in assessment of procedural risk in the treatment of unruptured aneurysm, University of California, San Francisco, 1990–1997

 
The case characteristics associated with preprocedural risk assessments were different between specialties in multivariable linear regression models. For neurosurgeons, independent predictors for higher risk assessments were patient age (p < 0.001), Rankin score at admission (p = 0.001), aneurysm diameter (p < 0.001), ratio of aneurysm neck to overall diameter (p = 0.04), and deeper location (cavernous internal carotid, ophthalmic, basilar terminus, vertebrobasilar/posterior cerebral, all p's < 0.001; supraclinoid internal carotid, p = 0.01). For interventional radiologists, higher risk assessments were associated with aneurysm diameter and ratio of aneurysm neck to overall diameter (both p's < 0.001) and more superficial location (middle cerebral, p < 0.001; anterior cerebral, p = 0.005; vertebrobasilar/posterior cerebral, p = 0.006); interventional radiologists' risk assessments were lower for aneurysms in the cavernous segment (p < 0.001).

Risk and outcome
For each procedure, higher preprocedural risk assessments were associated with greater risk of complications in unadjusted analyses (risk of a procedural complication per standard deviation of assessed risk: for surgical cases, odds ratio (OR) = 3.2, 95 percent confidence interval (CI): 1.6, 6.4; p = 0.001; for endovascular cases, OR = 2.3, 95 percent CI: 1.1, 4.8; p = 0.03). Risk assessments significantly contributed to multivariable models adjusted for case characteristics (table 3). Furthermore, the relative impact of risk assessment on the fit of the complete multivariable models was high, ranking first for surgery cases and second for endovascular cases. Predictors of complications differed for surgery and endovascular therapy (table 3).


View this table:
[in this window]
[in a new window]
 
TABLE 3. Predicted risk of a procedural complication in the treatment of unruptured aneurysm from multivariable models, according to assessments of risk and case characteristics, University of California, San Francisco, 1990–1997

 
Risk assessments were less accurate predictors of complications across specialty. There was no association between interventional radiologists' risk assessments and risk of a complication from neurosurgery (OR = 1.0, 95 percent CI: 0.61, 1.8; p = 0.86). Although neurosurgeons' risk assessments tended to predict endovascular complications, the association was not significant (odds of a complication for each standard deviation of assessed risk: OR = 2.1, 95 percent CI: 0.96, 4.4; p = 0.06).

Treatment and outcome
Neurosurgeons rated the group of patients treated by endovascular therapy as higher-risk than the surgery group (table 1). Conversely, interventional radiologists rated the surgical cases as higher-risk. When risk assessments were combined across specialties, overall risk assessments were the same in the two groups.

A complication occurred in 46 percent of surgical cases and 23 percent of endovascular therapy cases. Logistic regression models revealed greater disparity in risk of a complication after adjustment for case characteristics alone, after adjustment for overall risk assessments alone, and in a full model with risk assessments and case characteristics included (table 4).


View this table:
[in this window]
[in a new window]
 
TABLE 4. Contribution of risk assessments to comparison of procedural risks* in the treatment of unruptured aneurysm, University of California, San Francisco, 1990–1997

 
Since the neurosurgeons and the interventional radiologists assessed risk differently, reflecting the particular risks of each procedure, we evaluated complications for each combination of surgical and endovascular risk assessments (figure 2). Complications were less frequent for endovascular therapy in all strata, even in the group classified as low-risk by neurosurgeons and high-risk by interventional radiologists. There was no evidence of heterogeneity between risk groups (p = 0.67), and the combined odds ratio for surgery as compared with endovascular therapy across strata was 3.6 (95 percent CI: 1.4, 7.6; p = 0.004).



View larger version (38K):
[in this window]
[in a new window]
 
FIGURE 2. Risk of a complication with the use of endovascular embolization (black bars) and surgery (mottled bars) in the treatment of unruptured aneurysm, according to risk assessments made by practitioners in each specialty, University of California, San Francisco, 1990–1997. The specialists' risk assessments, determined from blinded review, were dichotomized at the mean values.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Blinded prospective review attempts to advance observational study methodology by recreating the flow of information in a prospective study in order to evaluate indications and risks of treatment in an unbiased manner. For the design to be useful, judgments about whether treatment is indicated and assessments of treatment risk must be reproducible and valid.

Choosing the best treatment is only important for patients who are candidates for more than one therapy. Therefore, observational studies of treatment effects should attempt to define a cohort of patients who could have received any of the compared therapies, as is required for inclusion in randomized trials (7Go). However, there is often disagreement between physicians regarding perceived indications for treatment, which is evident in the practice of obtaining second opinions.

Blinded prospective review offers a method of evaluating perceived indications for treatment without bias due to hindsight about a specific treatment decision or outcome. We found "fair" agreement between intrarater assessments separated by more than 3 months, and agreement improved dramatically when a single misguided reviewer was removed from the analysis. As expected, there was variability between practitioner pairs in determination of whether treatment was justified, with kappa values of 0.27–0.48 between practitioners in a specialty. The neurosurgeon with the lowest number of treated cases most frequently rated treatment as not indicated, as would be anticipated for a practitioner who is less comfortable with a procedure. To reduce the effect of an individual physician's input in a medical system that often includes second opinions, we used the majority assessment of panels of three practitioners to determine whether a case would be included in the final cohort. This system produced perfect agreement in test-retest analysis, confirming it as a reproducible approach to defining a cohort of patients who could have received either therapy.

It is impossible to determine whether treatment in a given case is justified, since procedural complications will vary between practitioners within a specialty and because the risks of not treating are often poorly understood. Therefore, it is impossible to directly validate a decision to recommend treatment. However, several findings are reassuring that inclusion decisions were valid in this study. First, very few of the cases eliminated were excluded by the specialists who had performed the treatment (figure 1). Second, physicians' determinations were related to a number of perceived and confirmed risk factors for poor outcomes. Third, risk factors generally differed less between treatment groups after the exclusion process (table 1). Together, these findings argue for the validity of the exclusion component of the blinded prospective review. Panels of practitioners who are blinded with regard to treatment modality and outcome can reproducibly create a cohort of patients judged to be candidates for all treatments being evaluated.

Assurance that members of treatment groups could have received either therapy does not guarantee that preprocedure prognosis and case complexity are similar in the treatment groups. Prognostication is the role of the treating physician, in a process involving synthesis of a large number of clinical variables, study results, and experience. Physicians are often suspicious, perhaps correctly, that any combination of factors in a multivariable model is adequate to control for preprocedure prognosis (1Go). Practitioners' concerns about residual uncontrolled confounding often cannot be addressed in observational studies, and this may reduce the impact of the findings.

Defining practitioners' perceptions of preprocedural risk allows incorporation of the complex, ill-defined prognostication process into a study. Without knowledge of case treatment and outcome, practitioners commit to an assessment of preprocedural risk—the latent variable researchers attempt to model with multivariable statistics. In this demonstration, we found physicians' assessments of preprocedural risk to be highly reproducible upon test-retest analysis (weighted {kappa} = 0.88, n = 11) and highly correlated between physicians in a specialty (for endovascular cases, R = 0.90; for surgery, R = 0.87). These results suggest that reproducible assessment of preprocedural risk is possible.

To determine whether risk assessments were valid representations of patient prognosis, we tested risk assessments as predictors of procedural complications. Overall risk assessments for each specialty were generated by combining risk assessments within each specialty after normalization. Neurosurgeons predicted complications of surgery and interventional radiologists predicted complications of endovascular therapy, which suggests that the risk assessments were meaningful.

Next we sought to determine whether physician risk assessment contributed information that was unavailable in a multivariable model of case characteristics. For both procedures, multivariable models that included all case characteristics were significantly improved by the addition of risk assessment variables, and risk assessments ranked first and second in impact on the overall models (table 3). Thus, the risk assessments provided important information that was missing from the collection of case characteristics.

Finally, it was important to determine whether the risk assessment process enhanced the validity of the study. In this demonstration, the crude risk of complications was much greater in the group treated surgically, and the difference persisted after adjustment for case characteristics (table 4). Residual confounding by indication could account for the outcome difference even after adjustment for case characteristics, since an important prognostic factor may not have been included. However, overall risk assessments were identical in the two treatment groups, and surgeons rated the group treated by endovascular therapy as higher-risk (table 1). Therefore, it would be difficult for a neurosurgeon to argue that he or she has treated the higher-risk cases: He/she has indicated otherwise. The possibility that a prognostic factor is unbalanced between groups cannot be ruled out, but this factor would need to be unknown or underappreciated by practitioners of the compared therapies. Knowledge of a prognostic factor is required for confounding by indication to occur, since only then can the prognostic factor affect treatment decisions (1Go). Therefore, unknown prognostic factors may be confounders, but they do not produce confounding by indication, the real bane of observational treatment studies.

Even though endovascular therapy may be safer overall, it is possible that some subgroups of patients would be treated more safely with surgery. Specifically, cases with high risk for endovascular therapy and low risk for surgery may be best treated by surgery. Furthermore, risk was assessed differently for the two procedures, and prognostic factors also differed, so selection could potentially improve care even further than treating all patients with endovascular therapy. The risk assessment process allows evaluation of this possibility. Patients were stratified into high- and low-risk groups for each procedure (figure 2). We found that complications were more frequent for surgery in all strata, including the one with high endovascular risk and low surgical risk. Therefore, the finding of greater safety for endovascular therapy was robust across risk groups.

This study design will not be useful for all therapies. For the design to control confounding by indication, information used to make treatment decisions and to prognosticate must be accurately recreated and presented to practitioners as though the case were being treated prospectively. If important information is available to the treating physician but not to the reviewer, inadequate estimation of the preprocedural risk may result, and confounding by indication may remain in the final analysis. The design was practical in our study because abstracted patient information and radiographic images are routinely the only information used to identify candidates for the compared therapies. If a patient examination had been part of the preprocedural decision-making, it might not have been possible to accurately reproduce the clinical pretreatment scenario, since an examination may identify factors that are not readily quantifiable. This limits the usefulness of the design to those situations in which the important clinical variables can be accurately recreated and presented to the reviewers.

Generalization outside the study center may be affected by variability in case selection and risk assessment. However, this is not a problem unique to this study design. Certainly, procedural skills also vary between institutions. Integrating knowledge about case selection and perceived risk clarifies a center's practice pattern. This may enhance generalizability by allowing readers to compare their case selection with that in the study.

Randomized controlled trials provide the only reliable method with which to assure validity in studies of treatment effects (1Go, 2Go). However, randomized trials are not always possible because of cost, practicalities, and ethical concerns (3Go, 7Go, 29Go). Therefore, reliable observational methods are required. Blinded prospective review may offer a method for improving observational studies of treatment effects by identifying a cohort of patients who could have received either of the compared therapies, and by incorporating practitioners' assessments of preprocedural risk into the analysis. In this way, confounding by indication can be identified and controlled more completely than in traditional, retrospective study designs.


    ACKNOWLEDGMENTS
 
This work was funded by an unrestricted grant from Target Therapeutics, Inc. (Freemont, California), a manufacturer of platinum coils used for coil embolization. The author is a clinical research fellow of the National Stroke Association and is supported financially by grant NS02042 from the National Institute of Neurological Disorders and Stroke.

The author thanks Drs. Ira Tager and Alan Hubbard for assistance with design development and for important editorial comments.


    NOTES
 
(Reprint requests to Dr. S. Claiborne Johnston at this address).


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Miettinen OS. The need for randomization in the study of intended effects. Stat Med 1983;2:267–71.[Medline]
  2. Green SB, Byar DP. Using observational data from registries to compare treatments: the fallacy of omnimetrics. Stat Med 1984;3:361–73.[ISI][Medline]
  3. Feinstein AR. Current problems and future challenges in randomized clinical trials. Circulation 1984;70:767–74.[ISI][Medline]
  4. Grobbee DE, Hoes AW. Confounding and indication for treatment in evaluation of drug treatment for hypertension. BMJ 1997;315:1151–4.[Free Full Text]
  5. Walker AM. Confounding by indication. Epidemiology 1996;7:335–6.[ISI][Medline]
  6. Poses RM, Smith WR, McClish DK, et al. Controlling for confounding by indication for treatment: are administrative data equivalent to clinical data? Med Care 1995;33(suppl):AS36–46.[ISI][Medline]
  7. Horwitz RI, Viscoli CM, Clemens JD, et al. Developing improved observational methods for evaluating therapeutic effectiveness. Am J Med 1990;89:630–8.[ISI][Medline]
  8. Johnston SC, Selvin S, Gress DR. The burden, trends, and demographics of mortality from subarachnoid hemorrhage. Neurology 1998;50:1413–18.[Abstract]
  9. Schievink WI, Wijdicks EF, Parisi JE, et al. Sudden death from aneurysmal subarachnoid hemorrhage. Neurology 1995;45:871–4.[Abstract]
  10. Schievink WI. Intracranial aneurysms. N Engl J Med 1997;336:28–40.[Free Full Text]
  11. Solomon RA, Fink ME, Pile-Spellman J. Surgical management of unruptured intracranial aneurysms. J Neurosurg 1994;80:440–6.[ISI][Medline]
  12. Guglielmi G, Vinuela F, Dion J, et al. Electrothrombosis of saccular aneurysms via endovascular approach. Part 2: preliminary clinical experience. J Neurosurg 1991;75:8–14.[ISI][Medline]
  13. Guglielmi G, Vinuela F, Sepetka I, et al. Electrothrombosis of saccular aneurysms via endovascular approach. Part 1: electrochemical basis, technique, and experimental results. J Neurosurg 1991;75:1–7.[ISI][Medline]
  14. Bryan RN, Rigamonti D, Mathis JM. The treatment of acutely ruptured cerebral aneurysms: endovascular therapy versus surgery. AJNR Am J Neuroradiol 1997;18:1826–30.[Free Full Text]
  15. Johnston SC, Dudley RA, Gress DR, et al. Surgical and endovascular treatment of unruptured cerebral aneurysms at university hospitals. Neurology 1999;52:1799–805.[Abstract/Free Full Text]
  16. The International Study of Unruptured Intracranial Aneurysms Investigators. Unruptured intracranial aneurysms—risk of rupture and risks of surgical intervention. N Engl J Med 1998;339:1725–33.[Abstract/Free Full Text]
  17. Raaymakers TW, Rinkel GJ, Limburg M, et al. Mortality and morbidity of surgery for unruptured intracranial aneurysms: a meta-analysis. Stroke 1998;29:1531–8.[Abstract/Free Full Text]
  18. Vinuela F, Duckwiler G, Mawad M. Guglielmi detachable coil embolization of acute intracranial aneurysm: perioperative anatomical and clinical outcome in 403 patients. J Neurosurg 1997;86:475–82.[ISI][Medline]
  19. Guglielmi G, Vinuela F, Duckwiler G, et al. Endovascular treatment of posterior circulation aneurysms by electrothrombosis using electrically detachable coils. J Neurosurg 1992;77:515–24.[ISI][Medline]
  20. Cognard C, Weill A, Castaings L, et al. Intracranial berry aneurysms: angiographic and clinical results after endovascular treatment. Radiology 1998;206:499–510.[Abstract]
  21. Khanna RK, Malik GM, Qureshi N. Predicting outcome following surgical treatment of unruptured intracranial aneurysms: a proposed grading system. J Neurosurg 1996;84:49–54.[ISI][Medline]
  22. Johnston SC, Wilson CB, Halbach VV, et al. Endovascular and surgical treatment of unruptured cerebral aneurysms: comparison of risks. Ann Neurol 2000;48:11–19.[ISI][Medline]
  23. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.[ISI][Medline]
  24. Daniel WW. Biostatistics: a foundation for analysis in the health sciences. 6th ed. New York, NY: John Wiley and Sons, Inc, 1995.
  25. Rankin J. Cerebral vascular accidents in patients over the age of 60. II. Prognosis. Scot Med J 1957;2:200–15.
  26. Krause N, Ragland DR, Greiner BA, et al. Physical workload and ergonomic factors associated with prevalence of back and neck pain in urban transit operators. Spine 1997;22:2117–26.[ISI][Medline]
  27. Selvin S. Practical biostatistical methods. Belmont, CA: Wadsworth Publishing Company, 1995.
  28. Kleinbaum DG, Kupper LL, Morgenstern H. Epidemiologic research: principles and quantitative methods. New York, NY: Van Nostrand Reinhold, 1982.
  29. Greenfield S. The state of outcome research: are we on target? (Editorial). N Engl J Med 1989;320:1142–3.[ISI][Medline]
Received for publication February 7, 2000. Accepted for publication February 26, 2001.