Unidad de Investigación, Hospital de Galdakao, Galdakao, Vizcaya,
1 Departamento de Matemática Aplicada, Estadística e Investigación Operativa, Universidad del País Vasco, Lejona, Vizcaya,
2 Servicio de Traumatología, Hospital de Bajo Deba, Mendaro, Guipuzcoa,
3 Servicio de Traumatología, Hospital de Santiago, Vitoria, Álava,
4 Servicio de Traumatología, Hospital del Bidasoa, Fuenterrabia, Guipuzcoa,
5 Servicio de Traumatología, Hospital de Txagorritxu, Vitoria, Alava and
6 Servicio de Traumatología, Hospital de Galdakao, Galdakao, Vizcaya, Spain
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods. Patients with a diagnosis of osteoarthritis who were undergoing THR in five public hospitals in Spain were included consecutively in the study during a 1-yr period. The appropriateness of the indication was judged by explicit criteria developed using a mutidisciplinary approach. Complications were measured 3 months after surgery. One year after discharge, pain, functional limitation and general health were measured.
Results. After evaluation of 583 patients, 82 (13.6%) were considered to have undergone inappropriate procedures, and for 279 (46.2%) patients indication for the procedure was considered uncertain. Differences were found in the rate of appropriateness among some centres. One year after discharge, the perception of general health was slightly better in those patients who had been judged to have undergone an appropriate procedure.
Conclusions. The study identified a moderate percentage of inappropriately performed THR. When considered together with those cases that were judged to have uncertain indications, the results indicate that further studies should be done to identify patients who may have an inadequate benefit:risk ratio from this procedure.
KEY WORDS: Hip prosthesis, Appropriateness, Utilization review, Quality of health care.
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Large, inexplicable geographical variations in the population-based service rates for many surgical procedures, including orthopaedic procedures [5, 6], have been found. Total hip replacement (THR) has been practised widely since its development by Charnley [7a]. The rates of THR, however, vary widely both within and between countries [7b], a finding that cannot be explained solely by differences in the prevalence of hip disease. Variations in clinical decision-making, among other factors, may also contribute [8].
Although public hospitals increasingly have similar numbers of technological and human resources, the use of this procedure seems to vary from one centre to another, even though patient characteristics do not differ from region to region within Spain. Although the benefit of the intervention has been proved extensively for patients with severe hip pain or functional limitations [9, 10], joint replacement is also performed in patients with less severe symptoms [11, 12].
The goal of this study was to apply explicit criteria, developed by the use of a mutidisciplinary approach, to examine the appropriateness of the indications for THR in various hospitals in Spain.
![]() |
Patients and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The first round of ratings was performed before the panel meeting. Results were collated and presented to the panellists at a one-day meeting. Each panellist received the anonymous ratings of all the other panellists as well as a reminder of his or her own ratings. After extensive discussion, the panellists revised the indications according to the definition given above. Each panellist rated 216 separate indications. The value of the stages of the RAND® process is to improve intrapanel agreement and consensus on the definitions of the scenarios after the multidisciplinary discussion. This process provides some face validity.
Finally, all indications considered appropriate in the second round were rated by the main panel on a 9-point scale to determine those that were considered necessary. Necessity was defined as meaning that a procedure is not only appropriate but crucial and that it would be improper not to recommend it in a given clinical situation, as defined by previous authors [20]. Indications with a median necessity rating from 7 to 9 and with no disagreement were considered necessary. The other appropriate indications were considered elective.
Intra- and interpanel reliabilities were studied. Weighted kappa statistics were 0.81 for a testretest intrapanel reliability study and 0.77 when comparison was made with a second panel of nine different orthopaedic surgeons. More details are given elsewhere [21].
Data collection
This prospective observational study took place in five large public hospitals (four affiliated to a university and one community-based); it started in December 1996 and ended in December 1997. All the hospitals belonged to the Basque Health ServiceOsakidetza, a local government agency in the Basque Country, and to the Spanish National Health Service. The identities of the hospitals and surgeons were not revealed in the research reports. Physicians in each hospital were blinded to the study goals.
All consecutive patients with a diagnosis of osteoarthritis and who were undergoing THR surgery, who were followed in any of the five hospitals, were included in the study. Patients with malignant, severe or psychiatric diseases and who were unable to communicate or who refused to participate were excluded. Of 600 patients who fulfilled the selection criteria, 97.2% agreed to participate. To collect data and determine appropriateness, we developed a computerized algorithm based on the results produced by our panel. We also developed data collection questionnaires that included variables before the intervention, admission and discharge, the intervention itself, and complications at 3 months after discharge. Beside the variables belonging to the appropriateness algorithm, which have been mentioned previously, other variables collected included sociodemographic data, patients' family support, height, weight, main complaint, 12 comorbidities (diabetes, hypertension, cardiac disease, dementia, depression, chronic obstructive pulmonary disease, stroke, cancer, renal disease, hepatic disease, anaemia and arthritis), other joints affected, previous interventions, intervention characteristics, local and general complications peri- and post-intervention, death, and length of hospital stay.
A single trainee reviewer collected the data with a standardized questionnaire. The reviewer was blinded to the specific study goals. He located all patients who fulfilled the selection criteria and interviewed them before surgery, at which time their permission for the study was also obtained. He collected data from each patient on pain, functional limitation and previous non-surgical treatment. Further data were recorded from medical records (diagnosis, age, ASA grade). A standardized questionnaire also was completed by a surgeon at each centre about bone quality, based on X-rays. At discharge, data related to admission and the intervention were gathered by reviewing the patients' medical records. Three months after discharge, all medical records were again reviewed to determine if any patient had been readmitted, died, or had any complication resulting from the intervention.
One year after discharge, all patients received a new questionnaire that included 11 items. Two questions measured pain intensity and the need for medication. Seven were related to functional status and the need for aid, and two were concerned with general health. Of the 583 patients, six had died and 21 had moved and could not be located. Of the 556 who finally received the questionnaire, 448 (80.6%) answered after two mail reminders. There were no statistically significant differences among responders and non-responders regarding sociodemographic variables, clinical characteristics, including pain and functional limitation, and appropriateness evaluation.
Statistical analysis
The unit of study was the patient. In cases where two interventions were done (3.5% of cases) we selected the first. Descriptive statistics, frequency tables and the means and standard deviations were obtained. The 2 test and Fisher's exact test were used to test for statistically significant differences among proportions. For continuous variables (e.g. age), analysis of variance was performed in the univariate analysis.
The odds ratio and the 95% confidence interval (CI) for an inappropriate intervention were calculated for each of the hospitals, taking as the reference the hospital with the lowest rate of inappropriate cases (hospital 4).
Principal components analysis (PCA) [22] was used to establish the appropriate relationship in each domain of the questionnaire employed at 1 yr. A total score for pain and functional limitation was obtained. PCA was used to determine the weight to assign to each item in the scale. The items were categorized as minor, moderate and severe, as in the pre-intervention survey.
All effects were significant at P < 0.05 unless otherwise noted. All statistical analyses were performed using SAS for Windows, version 6.12 [23].
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Twelve comorbidities were recorded. There were no differences in comorbidity rates among the five hospital samples or appropriateness groups, and there was no relationship with the study outcome. Other variables, such as mean age, proportion of males, patient's family support, other joints affected and previous hip interventions, also had no affect on the study outcome. Surgical risk, assessed by the ASA classification system, differed among centres (Table 1).
|
|
|
One year after discharge, 448 patients answered a follow-up questionnaire. Among those who underwent an inappropriate procedure, 8% considered their health poor at that time; among those who underwent an uncertain intervention, 4% considered their health to be poor, and of those who underwent an appropriate intervention, 1% considered their health to be poor. These differences were statistically significant.
When their present health status was compared with that before the intervention, 8% of those who underwent an inappropriate intervention reported that their health was poorer now. Of those who underwent an uncertain or appropriate procedure, the responses were 4 and 2% respectively. These differences were not statistically significant. Pain and functional limitation were similar among the three appropriateness categories 1 yr after discharge, although the improvement from before to after the intervention was greater for those who had appropriate indications (Table 4).
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The present study assessed prospectively the appropriateness of surgical indications for patients with osteoarthritis undergoing THR in a region of Spain in 1997, using appropriateness criteria developed by a national panel of experts. Applied as a screening tool for patients scheduled for surgery, these appropriateness criteria indicate a significant percentage of potentially inappropriate procedures, compared with other studies in which similar methods have been used [27, 28]. Nevertheless, this result implies that there is variation and uncertainty in the decision-making process regarding this procedure.
There were differences in the inappropriateness rates among centres. The fact that all these hospitals belong to the same public health system and have similar technological and human resources, and that there were no differences in the main sociodemographic and clinical variables of the patients indicates that further study is needed. Did the hospitals differ sufficiently in quality of care or managerial characteristics to justify these results? Was there any other variable that may explain the differences? These questions were beyond the scope of our study, but should be a focus for evaluation and improvement [29].
In contrast to a study of another procedure based on similar methodology [30], we did not find significant associations between the level of appropriateneness and peri- or post-intervention rates of complications or other quality indicators, such as the death rate and the length of hospital stay. Nevertheless, the fact that those patients whose interventions were considered inappropriate had complication rates similar to those of other categories of patients implies an important risk to the patient. If the procedure was performed inappropriately, the risk to the patient would have been unnecessary. Furthermore, our results indicated that patients who underwent appropriate procedures reported a benefit at 1 yr, but for those who underwent inappropriate procedures the benefit, although present, was minor.
As in previous studies of other procedures [3032], the percentage of indications considered uncertain was high. Our panel criteria were able to help us solve 54% of the indications, but in the remaining 46%, concerning mainly those patients with moderate pain or functional limitation, the instrument did not allow a decision to be made. This indicates that the panel could not come to an agreement and that the instrument was not sufficiently sensitive. To find out more about the benefits achieved by the patients, prospective studiesrandomized clinical trials if feasibleare necessary to measure all comprehensive outcomes, clinical as well as subjective, such as quality of life and patient satisfaction [33]. Prospective studies will also be able to measure changes in health after the intervention. We attempted to answer this question, in part, by measuring the patients' general health, pain and functional limitation 1 yr after discharge. Although the pain and functional limitation were similar among the three appropriateness categories, those patients who underwent an appropriate intervention perceived their general health to be slightly better than did patients who underwent an inappropriate procedure. This could indicate that patients who underwent an appropriate procedure improved their functional limitation and had less pain to a greater degree than those who underwent an inappropriate procedure. In addition, these patients may have had other sources of pain or disability that would not have responded to THR, or they were made ill by THR. It must be noted that pain (mainly) and functional limitation are key variables in the panel decision-making process. This has been shown in other studies [34], and should be taken into account.
The only previous study in which explicit criteria were developed to assess THR oriented its 120 indications to general practitioners who referred patients to specialists, and did not include surgical risk or bone quality assessment in the algorithm [35]. The field study carried out by that group using these criteria [36] reported percentages of pain and functional limitation similar to those found in our study. Their retrospective design presented some important methodological problems. Their review was based on medical records, and the researchers lacked relevant data for a high percentage (almost 50%) of the 329 patients recruited. When they applied their algorithm, the inappropriateness rates ranged from 4.4 to 15.8%. In our study, the prospective design allowed us to capture and follow all patients who fulfilled the selection criteria and to minimize losses.
There are two main questions to be addressed: were the explicit criteria valid, and were 14% of the indications really inappropriate? One of the major criticisms of the RAND method has been the unknown extent to which scientifically validated elements enter into the panel's decision-making process. Panellists had access to an up-to-date literature review, and the ratings of the indication for other procedures have been shown to reflect this knowledge. However, when few rigorous, randomized, controlled trials exist, as in the case of THR, the panel's judgement becomes more important than scientific data about efficacy, because the data do not exist [37]. As suggested by some authors [37, 38], this method may be useful when comparing the level of appropriateness among populations but not when directing the care of individual patients. When the method has been used as a utilization review tool, indications considered inappropriate have been subjected to individualized and thorough revision before being considered inappropriate [39]. In our case, this was not possible because of the absence of clinically relevant data in the medical record. The revision should have taken place at the same time as the criteria were applied, in order to establish which other factors may have influenced the final decision.
The composition of the panel may also have affected the ratings of appropriateness, as many studies have shown [40, 41]. Panels consisting of individuals from different specialties tend to rate procedures as less appropriate and more inappropriate than those representing one specialty [42]. In our case, we used the criteria developed by a panel of orthopaedic surgeons, which tended to be more liberal. When developing our criteria, we followed recommendations made for studies of this kind [43]. Regardless of the criticisms of this method, it has been accepted as an important tool in the evaluation of the care that is provided and the study of variations in it [28, 44].
The method of data collection had some limitations. The sole blinded reviewer was a physician trained to assess and record the main variables of the algorithm in a standardized manner so as to reduce the chance of bias. However, the quality of the important variables in the algorithm, such as pain and functional limitation, depended on his assessment. This introduced the possibility of information bias, which may have influenced the final judgement.
Discrepancies among physician and patient evaluations of disease have been reported for several diseases, including that included in this study [45]. Otherwise, why did patients with moderate pain or functional limitation undergo the procedure? Other variables, such as patient preferences and special social circumstances, may have influenced the physicians' decision-making. Also, the number of surgeons performing THR in each centre, the overall decision-making process and the way the service or hospital is organized may lead to differences among centres. Finally, although our data might be generalizable to some public centres in Spain, they would not be generalizable to other hospitals in developed countries that have similar health systems.
Preventing the overuse of THR would have a major impact on medical practice and costs. However, underuse of a medical procedure, which was not investigated in this study but may exist, has been recognized to be an equally important factor in determining the quality of care [46].
In conclusion, this prospective observational study demonstrates the gap that exists between actual day-to-day clinical practice and the explicit criteria that are generally accepted to reflect the appropriateness of care. Even as a screening tool, these criteria indicate that there is a considerable percentage of uncertain and inappropriate indications, and the patients involved may be at greater risk of generally poor health. Further research is needed on the effectiveness of THR compared with other alternatives [47]. Future studies should consider the benefits that patients achieve by undergoing this procedure in specific situations, such as those described in this study. This will require medium-to long-term follow-up studies to determine all relevant changes experienced by the patients. Meanwhile, efforts should be made to determine where the best care is provided and the reasons that explain the differences among centres.
|
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|