Evaluation by explicit criteria of the use of total hip joint replacement

J. M. Quintana, I. Aróstegui1, J. Azkarate2, J. I. Goenaga3, I. Guisasola4, A. Alfageme5 and A. Diego6

Unidad de Investigación, Hospital de Galdakao, Galdakao, Vizcaya,
1 Departamento de Matemática Aplicada, Estadística e Investigación Operativa, Universidad del País Vasco, Lejona, Vizcaya,
2 Servicio de Traumatología, Hospital de Bajo Deba, Mendaro, Guipuzcoa,
3 Servicio de Traumatología, Hospital de Santiago, Vitoria, Álava,
4 Servicio de Traumatología, Hospital del Bidasoa, Fuenterrabia, Guipuzcoa,
5 Servicio de Traumatología, Hospital de Txagorritxu, Vitoria, Alava and
6 Servicio de Traumatología, Hospital de Galdakao, Galdakao, Vizcaya, Spain


    Abstract
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
Objective. To evaluate the appropriateness of the use of total hip replacement (THR) using explicit criteria developed by an expert panel.

Methods. Patients with a diagnosis of osteoarthritis who were undergoing THR in five public hospitals in Spain were included consecutively in the study during a 1-yr period. The appropriateness of the indication was judged by explicit criteria developed using a mutidisciplinary approach. Complications were measured 3 months after surgery. One year after discharge, pain, functional limitation and general health were measured.

Results. After evaluation of 583 patients, 82 (13.6%) were considered to have undergone inappropriate procedures, and for 279 (46.2%) patients indication for the procedure was considered uncertain. Differences were found in the rate of appropriateness among some centres. One year after discharge, the perception of general health was slightly better in those patients who had been judged to have undergone an appropriate procedure.

Conclusions. The study identified a moderate percentage of inappropriately performed THR. When considered together with those cases that were judged to have uncertain indications, the results indicate that further studies should be done to identify patients who may have an inadequate benefit:risk ratio from this procedure.

KEY WORDS: Hip prosthesis, Appropriateness, Utilization review, Quality of health care.


    Introduction
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
The rapid development of costly medical technologies and services and the need for rationing health-care resources increasingly reduce our ability to give all patients the most beneficial care. Maintaining or improving the quality of care is a major concern and a challenge for clinicians, researchers and policy-makers [1]. The appropriateness of the use of medical procedures is one of the important elements of quality of care. The ability to maintain or enhance the quality of care in an increasingly cost-conscious environment may depend on our ability to determine the appropriateness of care. As defined, a procedure is considered appropriate if its health benefit exceeds its health risk by a sufficiently wide margin that the procedure is worth performing [2]. Overuse of a procedure occurs when a procedure is performed for an inappropriate indication. Health-care systems should function in such a way that appropriate care is increased and inappropriate care decreased. Reducing overuse should enhance the quality of care and decrease medical costs [3, 4].

Large, inexplicable geographical variations in the population-based service rates for many surgical procedures, including orthopaedic procedures [5, 6], have been found. Total hip replacement (THR) has been practised widely since its development by Charnley [7a]. The rates of THR, however, vary widely both within and between countries [7b], a finding that cannot be explained solely by differences in the prevalence of hip disease. Variations in clinical decision-making, among other factors, may also contribute [8].

Although public hospitals increasingly have similar numbers of technological and human resources, the use of this procedure seems to vary from one centre to another, even though patient characteristics do not differ from region to region within Spain. Although the benefit of the intervention has been proved extensively for patients with severe hip pain or functional limitations [9, 10], joint replacement is also performed in patients with less severe symptoms [11, 12].

The goal of this study was to apply explicit criteria, developed by the use of a mutidisciplinary approach, to examine the appropriateness of the indications for THR in various hospitals in Spain.


    Patients and methods
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
Development of explicit criteria
Criteria for measuring the appropriateness of the use of THR were developed according to an explicit method that has been described previously [13], known as the RAND appropriateness method (RAM). We used the following steps to implement the method. First, an extensive literature review was performed to summarize existing knowledge concerning the efficacy, effectiveness, risks and costs of THR and opinions concerning its use in patients with a diagnosis of osteoarthritis. Secondly, on the basis of this review, we developed a comprehensive and detailed list of 216 clinical scenarios (indications), each mutually exclusive and clinically specific, in which THR might be performed on these patients. Each indication was described in sufficient detail that patients with a given indication would be reasonably homogeneous. These indications included the following variables: age, pain (categorized as minor, moderate or severe on the basis of the need for medication and their effects on pain; relationship to rest, sleep or night disturbance; and rhythm and intensity [14, 15]), functional limitation assessment (categorized as mild, moderate or severe on the basis of the American College of Rheumatology (ACR) classification [16] and the need for a mobility aid), bone quality (measured on X-rays using the classification of Singh et al. [17]), surgical risk (based on the ASA criteria [18]), and previous non-surgical treatments performed or not performed based on current protocol [19] (for further definitions see Fig. 1Go and Appendix 1). Thirdly, we selected a national panel that included nine orthopaedic surgeons. The panellists were nationally recognized specialists in their field, selected from a list provided by our Orthopaedic Surgeons Society. The panellists were provided with the literature review and the list of indications, and they rated each indication based on the appropriateness of performing THR, considering the average patient and the average physician in 1997. Appropriateness was defined as meaning that the expected health benefit exceeds the expected negative consequences by a sufficiently wide margin that THR is worth performing.



View larger version (17K):
[in this window]
[in a new window]
 
FIG. 1. Variables of the algorithm and their categories.

 
Ratings were made on a 9-point scale. The use of THR for a specific indication was considered to be appropriate if the panel's median rating was between 7 and 9 without disagreement, inappropriate if the value was between 1 and 3 without disagreement, and uncertain if the median rating was between 4 and 6 or if the members of the panel disagreed. Disagreement was defined as occurring when at least three panellists rated an indication from 1 to 3 and at least three others rated it from 7 to 9. This method did not try to force panellists to reach agreement on appropriateness. The ratings were confidential and took place in two rounds, using a modified Delphi process.

The first round of ratings was performed before the panel meeting. Results were collated and presented to the panellists at a one-day meeting. Each panellist received the anonymous ratings of all the other panellists as well as a reminder of his or her own ratings. After extensive discussion, the panellists revised the indications according to the definition given above. Each panellist rated 216 separate indications. The value of the stages of the RAND® process is to improve intrapanel agreement and consensus on the definitions of the scenarios after the multidisciplinary discussion. This process provides some face validity.

Finally, all indications considered appropriate in the second round were rated by the main panel on a 9-point scale to determine those that were considered necessary. Necessity was defined as meaning that a procedure is not only appropriate but crucial and that it would be improper not to recommend it in a given clinical situation, as defined by previous authors [20]. Indications with a median necessity rating from 7 to 9 and with no disagreement were considered necessary. The other appropriate indications were considered elective.

Intra- and interpanel reliabilities were studied. Weighted kappa statistics were 0.81 for a test–retest intrapanel reliability study and 0.77 when comparison was made with a second panel of nine different orthopaedic surgeons. More details are given elsewhere [21].

Data collection
This prospective observational study took place in five large public hospitals (four affiliated to a university and one community-based); it started in December 1996 and ended in December 1997. All the hospitals belonged to the Basque Health Service—Osakidetza, a local government agency in the Basque Country, and to the Spanish National Health Service. The identities of the hospitals and surgeons were not revealed in the research reports. Physicians in each hospital were blinded to the study goals.

All consecutive patients with a diagnosis of osteoarthritis and who were undergoing THR surgery, who were followed in any of the five hospitals, were included in the study. Patients with malignant, severe or psychiatric diseases and who were unable to communicate or who refused to participate were excluded. Of 600 patients who fulfilled the selection criteria, 97.2% agreed to participate. To collect data and determine appropriateness, we developed a computerized algorithm based on the results produced by our panel. We also developed data collection questionnaires that included variables before the intervention, admission and discharge, the intervention itself, and complications at 3 months after discharge. Beside the variables belonging to the appropriateness algorithm, which have been mentioned previously, other variables collected included sociodemographic data, patients' family support, height, weight, main complaint, 12 comorbidities (diabetes, hypertension, cardiac disease, dementia, depression, chronic obstructive pulmonary disease, stroke, cancer, renal disease, hepatic disease, anaemia and arthritis), other joints affected, previous interventions, intervention characteristics, local and general complications peri- and post-intervention, death, and length of hospital stay.

A single trainee reviewer collected the data with a standardized questionnaire. The reviewer was blinded to the specific study goals. He located all patients who fulfilled the selection criteria and interviewed them before surgery, at which time their permission for the study was also obtained. He collected data from each patient on pain, functional limitation and previous non-surgical treatment. Further data were recorded from medical records (diagnosis, age, ASA grade). A standardized questionnaire also was completed by a surgeon at each centre about bone quality, based on X-rays. At discharge, data related to admission and the intervention were gathered by reviewing the patients' medical records. Three months after discharge, all medical records were again reviewed to determine if any patient had been readmitted, died, or had any complication resulting from the intervention.

One year after discharge, all patients received a new questionnaire that included 11 items. Two questions measured pain intensity and the need for medication. Seven were related to functional status and the need for aid, and two were concerned with general health. Of the 583 patients, six had died and 21 had moved and could not be located. Of the 556 who finally received the questionnaire, 448 (80.6%) answered after two mail reminders. There were no statistically significant differences among responders and non-responders regarding sociodemographic variables, clinical characteristics, including pain and functional limitation, and appropriateness evaluation.

Statistical analysis
The unit of study was the patient. In cases where two interventions were done (3.5% of cases) we selected the first. Descriptive statistics, frequency tables and the means and standard deviations were obtained. The {chi}2 test and Fisher's exact test were used to test for statistically significant differences among proportions. For continuous variables (e.g. age), analysis of variance was performed in the univariate analysis.

The odds ratio and the 95% confidence interval (CI) for an inappropriate intervention were calculated for each of the hospitals, taking as the reference the hospital with the lowest rate of inappropriate cases (hospital 4).

Principal components analysis (PCA) [22] was used to establish the appropriate relationship in each domain of the questionnaire employed at 1 yr. A total score for pain and functional limitation was obtained. PCA was used to determine the weight to assign to each item in the scale. The items were categorized as minor, moderate and severe, as in the pre-intervention survey.

All effects were significant at P < 0.05 unless otherwise noted. All statistical analyses were performed using SAS for Windows, version 6.12 [23].


    Results
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
During the 1-yr recruitment period, 583 patients with a diagnosis of osteoarthritis who underwent THR in any of the five participating hospitals were included. The mean age of the patients was 66.6 yr, and 52.1% of the patients were males. The main complaint was pain in 95% of the cases. The other hip was affected in 42.7% of the cases, and any other bodily joint in 63.5%.

Twelve comorbidities were recorded. There were no differences in comorbidity rates among the five hospital samples or appropriateness groups, and there was no relationship with the study outcome. Other variables, such as mean age, proportion of males, patient's family support, other joints affected and previous hip interventions, also had no affect on the study outcome. Surgical risk, assessed by the ASA classification system, differed among centres (Table 1Go).


View this table:
[in this window]
[in a new window]
 
TABLE 1. Descriptive statistics by hospital (n = 583 patients)

 
After applying the explicit criteria to the 583 interventions, we found that 39.1% of the interventions were considered appropriate, and most of these (86.4%) were necessary. The indication was judged uncertain in 47.2% of the interventions, and 13.7% of all patients were considered to have been treated inappropriately. Of the total of 216 indications that could have been scored by the expert panel, 88 (40.7%) appeared in the patient results. The indications found most frequently in each appropriateness category are shown in Table 2Go.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Description of the most frequent indications in each rating category of appropriateness

 
When considering the differences among the centres, hospital 4 had the highest rates of appropriateness (42.8%) and the lowest rate of inappropriateness (7.1%). Hospitals 2 and 5 had more than 15% of the inappropriate cases, but hospital 2 also had one of the lowest appropriateness rates (38%). These differences were statistically significant. Hospitals 2 and 5 had a risk of appropriateness of 2.5 (95% CI 0.90–7.9) and 2.6 (95% CI 1.0–8.0) times higher, respectively, than that of hospital 4. On the other hand, the highest appropriateness percentages were for hospitals 3 and 4, while the lowest were for hospital 1 (Table 3Go).


View this table:
[in this window]
[in a new window]
 
TABLE 3. Appropriateness rate by hospital

 
The mean age of the patients was higher in the appropriate (66.8 yr) than in the inappropriate group (62.9 yr), as was the percentage of women (57.5 vs 40%). The main complaint was pain in all cases, but the percentage of patients with pain was higher among those considered appropriate cases (96.9 vs 87.5% of the inappropriate cases). The pain and functional limitation level was more severe in those evaluated as having undergone appropriate procedures. In all the previous cases, differences were statistically significant. There were no differences in preoperative and 3-month postoperative complications, death rates or length of hospital stay among those who underwent appropriate, uncertain or inappropriate interventions.

One year after discharge, 448 patients answered a follow-up questionnaire. Among those who underwent an inappropriate procedure, 8% considered their health poor at that time; among those who underwent an uncertain intervention, 4% considered their health to be poor, and of those who underwent an appropriate intervention, 1% considered their health to be poor. These differences were statistically significant.

When their present health status was compared with that before the intervention, 8% of those who underwent an inappropriate intervention reported that their health was poorer now. Of those who underwent an uncertain or appropriate procedure, the responses were 4 and 2% respectively. These differences were not statistically significant. Pain and functional limitation were similar among the three appropriateness categories 1 yr after discharge, although the improvement from before to after the intervention was greater for those who had appropriate indications (Table 4Go).


View this table:
[in this window]
[in a new window]
 
TABLE 4. Comparison before and after intervention

 


    Discussion
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
Some authors consider THR to be one of the most relevant procedures to be investigated in health services research [24, 25]. Among the reasons are its impact on the patient, the high frequency with which it is performed, and its increasing cost. Although the expected variation in its use was not as high as in other studies [26], it does exist. This justifies studying the appropriateness of its indication.

The present study assessed prospectively the appropriateness of surgical indications for patients with osteoarthritis undergoing THR in a region of Spain in 1997, using appropriateness criteria developed by a national panel of experts. Applied as a screening tool for patients scheduled for surgery, these appropriateness criteria indicate a significant percentage of potentially inappropriate procedures, compared with other studies in which similar methods have been used [27, 28]. Nevertheless, this result implies that there is variation and uncertainty in the decision-making process regarding this procedure.

There were differences in the inappropriateness rates among centres. The fact that all these hospitals belong to the same public health system and have similar technological and human resources, and that there were no differences in the main sociodemographic and clinical variables of the patients indicates that further study is needed. Did the hospitals differ sufficiently in quality of care or managerial characteristics to justify these results? Was there any other variable that may explain the differences? These questions were beyond the scope of our study, but should be a focus for evaluation and improvement [29].

In contrast to a study of another procedure based on similar methodology [30], we did not find significant associations between the level of appropriateneness and peri- or post-intervention rates of complications or other quality indicators, such as the death rate and the length of hospital stay. Nevertheless, the fact that those patients whose interventions were considered inappropriate had complication rates similar to those of other categories of patients implies an important risk to the patient. If the procedure was performed inappropriately, the risk to the patient would have been unnecessary. Furthermore, our results indicated that patients who underwent appropriate procedures reported a benefit at 1 yr, but for those who underwent inappropriate procedures the benefit, although present, was minor.

As in previous studies of other procedures [3032], the percentage of indications considered uncertain was high. Our panel criteria were able to help us solve 54% of the indications, but in the remaining 46%, concerning mainly those patients with moderate pain or functional limitation, the instrument did not allow a decision to be made. This indicates that the panel could not come to an agreement and that the instrument was not sufficiently sensitive. To find out more about the benefits achieved by the patients, prospective studies—randomized clinical trials if feasible—are necessary to measure all comprehensive outcomes, clinical as well as subjective, such as quality of life and patient satisfaction [33]. Prospective studies will also be able to measure changes in health after the intervention. We attempted to answer this question, in part, by measuring the patients' general health, pain and functional limitation 1 yr after discharge. Although the pain and functional limitation were similar among the three appropriateness categories, those patients who underwent an appropriate intervention perceived their general health to be slightly better than did patients who underwent an inappropriate procedure. This could indicate that patients who underwent an appropriate procedure improved their functional limitation and had less pain to a greater degree than those who underwent an inappropriate procedure. In addition, these patients may have had other sources of pain or disability that would not have responded to THR, or they were made ill by THR. It must be noted that pain (mainly) and functional limitation are key variables in the panel decision-making process. This has been shown in other studies [34], and should be taken into account.

The only previous study in which explicit criteria were developed to assess THR oriented its 120 indications to general practitioners who referred patients to specialists, and did not include surgical risk or bone quality assessment in the algorithm [35]. The field study carried out by that group using these criteria [36] reported percentages of pain and functional limitation similar to those found in our study. Their retrospective design presented some important methodological problems. Their review was based on medical records, and the researchers lacked relevant data for a high percentage (almost 50%) of the 329 patients recruited. When they applied their algorithm, the inappropriateness rates ranged from 4.4 to 15.8%. In our study, the prospective design allowed us to capture and follow all patients who fulfilled the selection criteria and to minimize losses.

There are two main questions to be addressed: were the explicit criteria valid, and were 14% of the indications really inappropriate? One of the major criticisms of the RAND method has been the unknown extent to which scientifically validated elements enter into the panel's decision-making process. Panellists had access to an up-to-date literature review, and the ratings of the indication for other procedures have been shown to reflect this knowledge. However, when few rigorous, randomized, controlled trials exist, as in the case of THR, the panel's judgement becomes more important than scientific data about efficacy, because the data do not exist [37]. As suggested by some authors [37, 38], this method may be useful when comparing the level of appropriateness among populations but not when directing the care of individual patients. When the method has been used as a utilization review tool, indications considered inappropriate have been subjected to individualized and thorough revision before being considered inappropriate [39]. In our case, this was not possible because of the absence of clinically relevant data in the medical record. The revision should have taken place at the same time as the criteria were applied, in order to establish which other factors may have influenced the final decision.

The composition of the panel may also have affected the ratings of appropriateness, as many studies have shown [40, 41]. Panels consisting of individuals from different specialties tend to rate procedures as less appropriate and more inappropriate than those representing one specialty [42]. In our case, we used the criteria developed by a panel of orthopaedic surgeons, which tended to be more liberal. When developing our criteria, we followed recommendations made for studies of this kind [43]. Regardless of the criticisms of this method, it has been accepted as an important tool in the evaluation of the care that is provided and the study of variations in it [28, 44].

The method of data collection had some limitations. The sole blinded reviewer was a physician trained to assess and record the main variables of the algorithm in a standardized manner so as to reduce the chance of bias. However, the quality of the important variables in the algorithm, such as pain and functional limitation, depended on his assessment. This introduced the possibility of information bias, which may have influenced the final judgement.

Discrepancies among physician and patient evaluations of disease have been reported for several diseases, including that included in this study [45]. Otherwise, why did patients with moderate pain or functional limitation undergo the procedure? Other variables, such as patient preferences and special social circumstances, may have influenced the physicians' decision-making. Also, the number of surgeons performing THR in each centre, the overall decision-making process and the way the service or hospital is organized may lead to differences among centres. Finally, although our data might be generalizable to some public centres in Spain, they would not be generalizable to other hospitals in developed countries that have similar health systems.

Preventing the overuse of THR would have a major impact on medical practice and costs. However, underuse of a medical procedure, which was not investigated in this study but may exist, has been recognized to be an equally important factor in determining the quality of care [46].

In conclusion, this prospective observational study demonstrates the gap that exists between actual day-to-day clinical practice and the explicit criteria that are generally accepted to reflect the appropriateness of care. Even as a screening tool, these criteria indicate that there is a considerable percentage of uncertain and inappropriate indications, and the patients involved may be at greater risk of generally poor health. Further research is needed on the effectiveness of THR compared with other alternatives [47]. Future studies should consider the benefits that patients achieve by undergoing this procedure in specific situations, such as those described in this study. This will require medium-to long-term follow-up studies to determine all relevant changes experienced by the patients. Meanwhile, efforts should be made to determine where the best care is provided and the reasons that explain the differences among centres.


View this table:
[in this window]
[in a new window]
 
APPENDIX 1. Classification of variables of the algorithm

 


    Acknowledgments
 
We thank Dr Pablo Lazaro of the Unidad de Investigación de Servicios Sanitarios of the Instituto de Salud Carlos III, Madrid, for assistance in the development of the algorithm and the panel debate, and Dr James Kahan for reviewing the manuscript. We also thank the following people for their contributions to this study: Drs Jose M. Aranburu, Andoni Arcelay, Jesús Azkoaga, Pedro Armendariz, Enrique Cáceres, Xabier Elexpe, Begoña Goicoetxea, Jon Letona, Manuel Martínez-Grande, Enrique Queipo de Llano, Ramón Tobio and Ignacio Vidaurreta, and individuals from the Quality Units of the participating hospitals. This study was supported by a grant from the FIS (reference 96/0020–05) and the Department of Health of the Basque Government.


    Notes
 
Correspondence to: J. M. Quintana, Unidad de Investigación, Hospital de Galdakao, Barrio Labeaga s/n. 48960 Galdakao, Vizcaya, Spain. Back


    References
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 

  1. Hadorn DC, Brook RH. The health care resource allocation debate. Defining our terms. J Am Med Assoc1991;266:3328–31.[Abstract]
  2. Brook RH. Appropriateness: the next frontier [editorial]. Br Med J1994;308:218–9.[Free Full Text]
  3. Schoenbaum SC. Toward fewer procedures and better outcomes. J Am Med Assoc1993;269:794–6.[ISI][Medline]
  4. Blumenthal D. The variation phenomenon in 1994. N Engl J Med1994;331:1017–8.[Free Full Text]
  5. Keller RB, Soule DN, Wennberg JE, Hanley DF. Dealing with geographic variations in the use of hospitals. The experience of the Maine Medical Assessment Foundation Orthopaedic Study Group. J Bone Joint Surg Am1990;72:1286–93.[Abstract]
  6. Madhok R, Lewallen DG, Wallrichs SL, Ilstrup DM, Kurland R, Melton J. Trends in the utilization of primary total hip arthroplasty, 1969 through 1990: A population-based study in Olmsted county, Minnesota. Mayo Clin Proc1993;68:11–8.[ISI][Medline]
  7. Charnley J. Anchorage of the femoral head prosthesis to the shaft of the femur. J Bone Joint Surg1960;42B,28–30.
  8. Peterson MGE, Hollenberg JP, Szatrowski P, Johanson NA, Mancuso CA, Charlson ME. Geographic variations in the rates of elective total hip and knee arthroplasties among Medicare beneficiaries in the United States. J Bone Joint Surg1992;74A:1530–9.[Abstract]
  9. Imamura K, Gair R, McKee M, Black N. Appropriateness of total hip joint replacement in the United Kingdom. World Hospitals Health Serv1997;32:10–14.
  10. Laupacis A, Bourne R, Rorabech C, Feeny D, Wong C, Tugwell P. The effect of elective total hip replacement on health-related quality of life. J Bone Joint Surg1993;75A:1619–26.[Abstract]
  11. Barrack RL. Hip arthroplasty: problems and decisions. Assessment of the symptomatic total hip. Orthopedics1994;17:793–5.[ISI][Medline]
  12. MacWillian CH, Yood MU, Verner JJ, McCarthy BD, Ward RE. Patient-related risk factors that predict poor outcome after total hip replacement. Health Serv Res1996;31:623–38.[ISI][Medline]
  13. Faulkner A, Kennedy LG, Baxter K, Donovan J, Wilkinson M, Bevan G. Effectiveness of hip prostheses in primary total hip replacement: a critical review of evidence and economic model. Health Technol Assess1998;2:3.
  14. Brook RH, Chassin MR, Fink A, Solomon DH, Kosecoff J, Park RE. A method for the detailed assessment of the appropriateness of medical technologies. Int J Technol Assess Health Care1986;2:53–63.[Medline]
  15. Merle D'Aubigne R. Cotation chiffrée de la fonction de la hanche. Rev Chirurg Orthop Reparatrice Appareil Mot1970;56:481–6.
  16. Lequesne M. Indices of severity and disease activity for osteoarthritis. Sem Arthritis Rheum1991;20:48–54.[Medline]
  17. Hochberg MC, Chang RW, Dwosh I, Lindsey S, Pincus T, Wolfe F. The American College of Rheumatology 1991 revised criteria for the classification of global functional status in rheumatoid arthritis. Arthritis Rheum1992;35:498–502.[ISI][Medline]
  18. Singh M, Nagrath AR, Maini PS. Changes in trabecular pattern of the upper end of the femur as an index of osteoporosis. J Bone Joint Surg1970;52A:457–67.[Medline]
  19. Schneider AJL. Assessment of risk factors and surgical outcome. Surg Clin N Am1983;63:1113–26.[ISI][Medline]
  20. Quintana JM. Empleo de la metodología de uso apropiado en el estudio de la utilización de un procedimiento quirúrgico: prótesis de cadera. Leioa, Spain: The University of the Basque Country, 1998.
  21. Kahan JP, Bernstein SJ, Leape LL, Hilborne LH, Park RE, Parker L et al. Measuring the necessity of medical procedures. Med Care1994;32:357–65.[ISI][Medline]
  22. Quintana JM, Aróstegui I, Azkarate J et al. Evaluation of explicit criteria for total hip joint replacement. J Clin Epidemiol2000; in press.
  23. Hatcher L. A step-by-step approach to using the SAS system for factor analysis and structural equation modeling. Cary, NC: SAS Institute, 1994.
  24. SAS Institute Inc. SAS Procedures Guide, Version 6. Cary, NC: SAS Institute, 1994.
  25. Phelps CE, Parente ST. Priority setting in medical technology and medical practice assessment. Med Care1990;28:703–23.[ISI][Medline]
  26. Phelps CE, Mooney C. Correction and update on ‘Priority setting in medical technology assessment’. Med Care1992;30:744–51.[ISI][Medline]
  27. Chassin MR, Brook RH, Park RE, Keesey J, Fink A, Kosecoff J et al. Variation in the use of medical and surgical services by the Medicare population. N Engl J Med1986;314:285–90.[Abstract]
  28. Larequi-Lauber T, Vader JP, Burnand B, Brook RH, Kosecoff J, Sloutskis D et al. Appropriateness of indications for surgery of lumbar disc hernia and spinal stenosis. Spine1997;22:203–9.[ISI][Medline]
  29. Casparie AF. The ambiguous relationship between practice variation and the appropriateness of care: an agenda for further research. Health Policy1996;35:247–65.[ISI][Medline]
  30. Schoenbaum SC. Toward fewer procedures and better outcomes. J Am Med Assoc1993;269:794–6.[ISI][Medline]
  31. Winslow CM, Solomon DH, Chassin MR, Kosecoff J, Merrick NJ, Brook RH. The appropriateness of carotid endarterectomy. N Engl J Med1988;318:721–7.[Abstract]
  32. Hilborne LH, Leape LL, Bernstein SJ, Park RE, Fiske ME, Kamberg CJ et al. The appropriateness of use of percutaneous transluminal coronary angioplasty in New York State. J Am Med Assoc1993;269:761–5.[Abstract]
  33. McGlynn EA, Naylor CD, Anderson GM, Leape LL, Park RE, Hilborne LH et al. Comparison of the appropriateness of coronary angiography and coronary artery bypass graft surgery between Canada and New York State. J Am Med Assoc1994;272:934–40.[Abstract]
  34. Wennberg J. The paradox of appropriate care. J Am Med Assoc1987;258:2568–9.[ISI][Medline]
  35. Hadorn DC, Holmes AC. The New Zealand priority criteria. Br Med J1997;314:131–4.[Abstract/Free Full Text]
  36. Naylor CD, Williams J. Primary hip and knee replacement surgery: Ontario criteria for case selection and surgical priority. Qual Health Care1996;5:20–30.[Abstract]
  37. Walraven CV, Peterson JM, Kapral M, Chan B, Bell M, Hawker G et al. Appropriateness of primary total hip and knee replacements in regions of Ontario with high and low utilization rates. Can Med Assoc J1996;155:697–706.[Abstract]
  38. Naylor CD. What is appropriate care? N Engl J Med1998;338:1918–20.[Free Full Text]
  39. Shekelle P, Kahan JP, Bernstein SJ, Leape LL, Kamberg CJ, Park RE. The reproducibility of a method to identify the overuse and underuse of medical procedures. N Engl J Med1998;338:1888–95.[Abstract/Free Full Text]
  40. Dubois RW. Appropriateness studies. N Engl J Med1994;330:433.
  41. Coulter I, Adams A, Shekelle P. Impact of varying panel membership on ratings of appropriateness in consensus panels: a comparison of a multi- and single-disciplinary panel. Health Serv Res1995;30:577–91.[ISI][Medline]
  42. Fraser GM, Pilpel D, Kosecoff J, Brook RH. Effect of panel composition on appropriateness ratings. Int J Qual Health Care1994;6:251–5.[Abstract]
  43. Kahan JP, Park RE, Leape LL, Bernstein SJ, Hilborne LH, Parker L et al. Variations by specialty in physician ratings of the appropriateness and necessity of indications for procedures. Med Care1996;34:512–23.[ISI][Medline]
  44. Naylor CD, Guyatt G. Users' guide to the medical literature. XI. How to use an article about a clinical utilization review. J Am Med Assoc1996;275:1435–9.[ISI][Medline]
  45. Phelps CE. Appropriateness studies. N Engl J Med1994;330:433–4.
  46. Lieberman JR, Dorey F, Shekelle P, Schumacher L, Thomas BJ, Kilgus DJ. Differences between patients' and physicians' evaluations of outcome after total hip arthroplasty. J Bone Joint Surg1996;78A:835–8.[Abstract/Free Full Text]
  47. Selby JV, Fireman BH, Lundstrom RJ, Swain BE, Truman AF, Wong CC et al. Variation among hospitals in coronary-angiography practices and outcomes after myocardial infarction in a large health maintenance organization. N Engl J Med1996;335:1888–96.[Abstract/Free Full Text]
  48. Total hip replacement: NIH consensus conference. J Am Med Assoc1995;273:1950–6.[Abstract]
Submitted 25 August 1999; revised version accepted 12 June 2000.