Re-evaluation and modification of the Stuivenberg Hospital Acute Renal Failure (SHARF) scoring system for the prognosis of acute renal failure: an independent multicentre, prospective study

R. L. Lins1, M. M. Elseviers2, R. Daelemans1, P. Arnouts3, J.-M. Billiouw4, M. Couttenye2, E. Gheuens1, P. Rogiers5, R. Rutsaert2, P. Van der Niepen6 and M. E. De Broe2

1 Department of Nephrology-Hypertension, ACZA Campus Stuivenberg, Lange Beeldekensstraat 267, B-2060 Antwerpen, 2 Department of Nephrology-Hypertension, University Hospital Antwerp, Wilrijkstraat 10, B-2650 Edegem/Antwerpen, 3 Department of Nephrology-Hypertension, Sint Jozefziekenhuis, Steenweg op Merksplas 44, B-2300 Turnhout, 4 Department of Nephrology-Hypertension, O.L.-Vrouw Ziekenhuis, Moorselbaan 164, B-9300 Aalst, 5 Department of Intensive Care, General Hospital Middelheim, Lindendreef 1, B-2020 Antwerpen and 6 Department of Nephrology-Hypertension, Free University Brussels (VUB), Laarbeeklaan 101, B-1090 Brussels, Belgium

Correspondence and offprint requests to: Robert Lins, MD, PhD, Department of Nephrology-Hypertension, ACZA Campus Stuivenberg, Lange Beeldekensstraat 267, B-2060 Antwerpen, Belgium. Email: robert.lins{at}pro.tiscali.be



   Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix. Calculation of...
 References
 
Background. A prognostic scoring system for hospital mortality in acute renal failure (Stuivenberg Hospital Acute Renal Failure, SHARF score) was developed in a single-centre study. The scoring system consists of two scores, for the time of diagnosis of acute renal failure (ARF) and for 48 h later, each originally based on four parameters (age, serum albumin, prothrombin time and heart failure). The scoring system was now tested and adapted in a prospective study.

Methods. The study involved eight intensive care units. We studied 293 consecutive patients with ARF in 6 months. Their mortality was 50.5%. The causes of ARF were medical in 184 (63%) patients and surgical in 108 (37%). In the latter group, 74 (69%) patients underwent cardiac and 19 (18%) vascular surgery.

Results. As the performance of the original SHARF scores was much lower in the multicentre study than in the original single-centre study, we re-analysed the multicentre data to customize the original model for the population studied. The independent variables were the score developed in the original study plus all additonal parameters that were significant on univariate analysis. The new multivariate analysis revealed an additional subset of three parameters for inclusion in the model (serum bilirubin, sepsis and hypotension). For the modified SHARF II score, r2 was 0.27 at 0 and 0.33 at 48 h, respectively, the receiver operating characteristic (ROC) values were 0.82 and 0.83, and the Hosmer–Lemeshow goodness-of-fit P values were 0.19 and 0.05.

Conclusion. After customizing and by using two scoring moments, this prediction model for hospital mortality in ARF is useful in different settings for comparing groups of patients and centres, quality assessment and clinical trials. We do not recommend its use for individual patient prognosis.

Keywords: acute renal failure; intensive care unit; mortality prediction; prognosis; prospective multicentre study; severity score



   Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix. Calculation of...
 References
 
In view of current trends in health care, especially with resources becoming increasingly scarce, the need to predict patient outcomes is evident [1]. After proper testing in various institutions and adaptation to different populations, prognostic scoring systems may aid physicians in different ways: to calculate the probabilities of outcomes in patient groups and in individual patients, advise patients and families regarding questions about continued life support, compare institutions and judge the quality of care and the performance of a medical care unit. Such predictive models can also be used to study new therapies, which might ameliorate outcomes in patients.

Acute renal failure (ARF) is a disease where risk modelling is important due to its high mortality rate of 50%, even higher in certain populations [2]. In general, however, scoring systems predict only the prognosis of a group of patients and are useful for research, but they are unlikely to be useful in the assessment of the prognosis of a given individual [3]. In these scoring systems, the parameters were obtained only once after hospitalization or recognition of ARF. Although it is important for intensive care unit (ICU) physicians and nephrologists to predict mortality in the first 24 h after hospitalization, according to some authors, relying on a single scoring point in time can be misleading [4–7].

In the developmental phase of our scoring system, a prospective, cohort, single-centre study was performed, involving 197 adult patients admitted consecutively to a medical ICU. Relevant parameters were documented at 0 and 48 h, in order to develop a score usable in different centres and for different causes of ARF [8]. Thus we developed a scoring protocol with two measuring points (Stuivenberg Hospital Acute Renal Failure, SHARF), one applied at the diagnosis of ARF and the other 48 h later. The same parameters were included in the score at the two time points, but were assigned different weights: age, albumin, prothrombin time, respiratory support and heart failure. Their r2 values were 0.36 and 0.44, respectively, which means that 36 and 44%, respectively, of the predicted mortality is explained by the model. The receiver operating characteristic (ROC) values at 0 and 48 h were 0.87 and 0.90 [8].

The model was next tested in the present prospective multicentre study in eight ICUs located in different medical and surgical settings and with different case mixes of patients.



   Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix. Calculation of...
 References
 
To re-evaluate and improve the model, we included in this study all adult patients (age ≥18 years) with a presumptive diagnosis of ARF who were admitted consecutively during a 6 month study period (from 15 September 1997 to 15 March 1998) to any of eight participating ICUs. These units were in two university hospitals with mixed surgical–medical populations (573 and 700 hospital beds and 30 and 30 ICU beds, respectively) and in six general hospitals: three with mixed populations (294–558 hospital beds and 6–30 ICU beds), one with a predominantly surgical population (651 hospital beds and 12 ICU beds) and two with almost exclusively medical populations (651 and 957 beds and 12 and 36 ICU beds). Included in the eight was the medical unit used for the original development of the model. Admission to the ICU was based on the clinical judgement of the attending physician in the emergency department or of the responsible physician of the ward where the patient had been hospitalized.

ARF was defined either as a serum creatinine of >2.0 mg/dl (176.8 µmol/l) in patients with previously normal kidney function, or in patients with known pre-existing mild to moderate renal disease with a >50% increase in creatinine above basal values. Patients with known chronic renal disease and a serum creatinine >3 mg% (266 µmol/l) before any acute deterioration or with clearly reduced kidney size on the first echography after admission did not qualify for the study. If renal function in patients with hydronephrosis recovered by >50%, the nature of their disease was considered acute.

The first set of data was collected on the first day that the criteria for the definition of ARF were met, the day of diagnosis (T0). For patients referred later in the course of their illness to the ICU, the day of admission to the ICU was taken as the starting day (T0). The second measurement point was 48 h later (T48).

The time of diagnosis of ARF in relation to hospital admission was defined as ‘initial’ if the inclusion criteria were met at the time of admission to the ICU, and as ‘delayed’ if the criteria were met during the ICU stay or if ARF developed when the patient was an in-patient on a regular ward before transfer to the ICU. The type of ARF, the cause and the primary disease setting were determined exactly as in the original single-centre study [8]. Also, the same clinical and laboratory parameters were documented at T0 and T48. The overall severity of illness was based on the APACHE II score [9], determined on the day when the first data collection was done for the score (i.e. T0). Relevant definitions of clinical parameters are given in Table 1. Organ failure was defined according to published criteria [10–12].


View this table:
[in this window]
[in a new window]
 
Table 1. Definitions of clinical parameters

 
The outcome of hospital treatment was determined as ‘died’, ‘end-stage renal disease’ (ESRD), ‘partially recovered’ (hospital discharge with a serum creatinine >2 mg/dl) or ‘completely recovered’. Patients who were transferred to other departments were followed-up to their discharge or death in hospital, and their last known serum creatinine was recorded.

From each of the eight participating institutions, every eligible patient was reported to the coordinating centre within 24 h after diagnosis, preferably using an established reporting form after it had been filled with admission data. After completion of the T48 data, the form was sent to the study monitor within 1 week after diagnosis. Corrections and additional information were requested when necessary. All laboratory data were given in the units of the local laboratory, and were recalculated to the units of the coordinating centre.

Statistical methods
All the prospectively collected parameters were tested for significance with univariate analysis using the Student t-test and the {chi}2 test.

First, all recorded parameters were compared between centres, to evaluate the quality of data collection and differences in patient populations and interpretation of parameters.

Secondly, the performance of the original model with values for T0 and T48 (SHARF0 and SHARF48) [8]) was evaluated. The squared correlation coefficient (r2) of the linear regression analysis was used to test the explanatory power of the model. The areas under the ROC curves were used to judge the discrimination ability of our approach [13]. The degree of correspondence (‘fitting’) between score and outcome was determined comparing the observed with the expected calculated mortality using the Hosmer–Lemeshow goodness-of-fit test [15]. In this test, comparing the observed and predicted frequencies for up to 10 cells, a small P-value means that the predicted values do not fit the data.

Thirdly, since the performance of the model in this population clearly was weaker than its performance in the original population, we re-analysed the data in order to identify the best additional subset of variables to customize the original model for the new populations. The independent variables that were entered in the new multivariate analysis were the SHARF scores developed in the original single-centre study plus all other parameters that were significant on univariate analysis. Weighted scores were assigned to each parameter based on the regression coefficients of the linear regression analysis. Parameters with skewed distributions were divided into categories to neutralize the effect of outliers. Age was divided into decades to simplify the formulae for bedside calculation. The coefficients were multiplied by 100 and rounded to obtain whole numbers. The final adapted scores at T0 (SHARF II0) and at T48 (SHARF II48) were calculated by summing the weights associated with each parameter. In the linear model, the significant contributors to mortality were exactly the same as in the logistic model. The same had been observed in the original, single-centre study [8]. The probability of in-hospital mortality was calculated using the score as the single parameter in a logistic regression equation: logit = ß0 + ß1(score). The logit was then converted to a probability of hospital mortality as Pr(y = 1/logit) = elogit/(1 + elogit), where y is 1 for patients who died and 0 for patients who lived, Pr indicates probability and e indicates the base of the natural logarithm [14]. Risk ratios were calculated for the parameters found to be significant contributors. The adapted scores (SHARF II0 and SHARF II48) were again tested using the r2, ROC values and Hosmer–Lemeshow statistics as had been done for the original scores.

Fourthly, the model was also tested in individual centres with at least 30 patients included in the study and in certain subpopulations: pre-renal causes vs acute tubular necrosis; medical vs surgical patients; dialysed vs non-dialysed patients; and initial vs delayed diagnosis of ARF. The standardized mortality ratio (SMR; ratio of the observed number of deaths vs that predicted by the model) for different ICUs and subpopulations [16] and the ROC values were calculated with their 95% confidence intervals (CIs).

Finally, the original and the newly designed scores were compared with the Acute Tubular Necrosis Severity Index of Liano [17] and the APACHE II score [9].



   Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix. Calculation of...
 References
 
A total of 80 895 patients were admitted to the participating hospitals in the 6 month study period; of these, 5113 were admitted initially or after some time to the ICUs of those hospitals. A total of 293 patients (0.36% of hospital admissions and 5.73% of ICU admissions) suffered from ARF. In the ARF group, 148 patients (50.5%) died, 42 of them (28.3%) within 48 h after the onset of ARF. The causes of ARF were medical in 184 (63.1%) and surgical in 108 (36.9%) patients. In the latter group, 74 (68.5%) had cardiac surgery and 19 (17.6%) vascular surgery. The average interval between ICU admission and diagnosis of ARF was 4.6±18.3 days for those initially admitted to the ICU and 11.3±16.0 for delayed admissions to the ICU (P-value of difference <0.01). The basic descriptive characteristics of patients are summarized in Table 2.


View this table:
[in this window]
[in a new window]
 
Table 2. Basic characteristics of the patients

 
Outcome was unaffected by gender, history of myocardial infarction, chronic renal failure, chronic obstructive lung disease, chronic liver disease, alcohol abuse, diabetes, underlying neoplasm, serum creatinine, glucose, sodium and haemoglobin, white blood cell count, urinary sodium or urine osmolality (a total of 15 parameters).

In univariate analysis, 20 other parameters were significantly different at T0 or T48 between survivors and non-survivors (Table 3). Using the original formulae, the r2 decreased to values <0.25 and the areas under the ROC curve to <0.80.


View this table:
[in this window]
[in a new window]
 
Table 3. Parameters significantly different between survivors (n = 147) and non-survivors (n = 146)

 
In the present analysis of the data, the original SHARF score formula plus three additional parameters (bilirubin, sepsis and hypotension) contributed significantly to the predictive value of the hospital mortality model at 0 and 48 h.

The final scoring formulae were:

SHARF II0 = 3.0 x age decade + 2.6 x serum albumin category T0 + 1.3 x prothrombin time category T0 + 16.8 x respiratory support T0 + 3.9 x heart failure T0 + 2.8 x serum bilirubin T0 + 27 x sepsis T0 + 21 x hypotension T0 – 17.

SHARF II48 = 3.9 x age decade + 3.3 x serum albumin category T0 + 1.7 x prothrombin time category T0 + 23.7 x respiratory support T48 + 8.8 x heart failure 48 h + 2.5 x serum bilirubin T48 + 24 x sepsis T48 + 17 x hypotension T0 – 28.

The data on the parameters included in the model are shown in Table 4. The calculation of the score is described in the Appendix. The mean values and 95% CIs of SHARF II scores were 36 (33–40) for survivors and 63 (59–67) for non-survivors at T0, and 32 (28–36) and 67 (61–72) at T48, respectively. The predicted probability of death for any possible score is given in Figure 1.


View this table:
[in this window]
[in a new window]
 
Table 4. Details of variables included in SHARF II0 and SHARF II48

 


View larger version (24K):
[in this window]
[in a new window]
 
Fig. 1. Probability of death curves at 0 and 48 h for SHARF II. The probability of in-hospital mortality was calculated by using the score as the single parameter in a logistic regression equation [14].

 
The r2 values were 0.27 at T0 and 0.33 at T48. The discriminative power of the new scoring system showed ROC values of 0.82 and 0.83, respectively. The calibration, tested with the Hosmer–Lemeshow goodness-of-fit C statistic, was for T0, C = 9.95, df = 8, P = 0.191; and for T48, C = 15.66, df = 8; P = 0.048. Figure 2 shows graphically the calibration of the observed and the predicted mortalities in probability intervals of 10%.



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 2. Calibration curves for SHARF II0 and SHARF II48 for 10% probability intervals.

 


   Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix. Calculation of...
 References
 
The SHARF score for the prediction of hospital mortality, first developed in a single-centre study [8], was tested in a new and different cohort of intensive care patients in eight centres, including the original centre. The aim of the study was to evaluate the original model in different populations and, if necessary, to modify it. Indeed, we found that there clearly was a different case mix in these centres compared with the original single centre, and that these variations can have significant impacts on the performance of the scoring model [18,19].

In the present multicentre study, the performance of the SHARF score developed in the original study deteriorated compared with the original single-centre results (ROC values of 0.67 and 0.78 compared with 0.87 and 0.90). Nevertheless, even these results compare favourably with those of similar efforts reported in the literature [1,20]. Douma et al. compared 11 mortality prediction models [20]. They found that, of all ARF scores, only the Liano model [17] had an ROC value of 0.78. Halstenberg et al., testing different models in their own study population, found a low performance [1]. In our series, the ROC values for the Liano model [17] and the APACHE II scores [9] were lower than for the SHARF scores. The goodness-of-fit was acceptable for the Liano scores but not for APACHE II. This lack of fit can be explained by the large number of cardiac surgery patients in our cohort (n = 74). As in the original APACHE II publication, these patients had better than expected prognoses and were not included in the analysis [9].

In an attempt to overcome these problems, we performed a new multivariate analysis of the multicentre data in order to identify the best additional subset of parameters that help to explain the mortality in this cohort. Accordingly, adding other parameters to those in the original scoring yielded clearly higher values for r2, the ROC and the Hosmer–Lemeshow goodness-of-fit statistics. Because there were not enough patients to divide the subjects randomly into a developmental and a validation sample (the split-sample method), the method of subgroup analysis was used to test the new customized scoring [21]. Therefore, the same tests of calibration and discrimation were performed for individual centre populations and subpopulations according to types and settings of renal failure. For most of the centres and the subgroups, the discriminative powers, as expressed by ROC values, were comparable with the overall results. Additionally, the SMRs confirmed the good ‘fit’ of the predicted mortality with the observed mortality in most centres and subgroups, but there was a large degree of ‘uncertainty’ as expressed by the 95% CIs [22]

The question remains of whether or not other outcomes such as those related to long-term survival and quality of life issues, which probably are more relevant to individual patients, can be predicted by any model that has been developed up to now [20]. This can only be answered at a later stage. In our study, we confronted the SHARF II score with 1-year survival rate (manuscript in preparation).

The SHARF II score at T48 predicted mortality better than at diagnosis (T0), as had already been found in the developmental phase of the original model [8]. The discriminative power of the SHARF II in the individual centres and subgroups of this study confirms the improvement of its performance at T48. In general studies of outcomes in intensive care settings, some authors recently paid attention to repeated scores [23,24]. This novel approach of adding a second measuring point to ameliorate the accuracy of the prediction increases the possibility of using the score to make predictions for individual patients.

We conclude that our original scoring model, developed in our single-centre study, with two measuring points showed an acceptable performance in its multicentre trial. The premonitory value of T48 scores is as good as the best results published until now.

However, customizing the original scoring model for the multicentric population of this study makes this model for predicting hospital mortality in ARF useful in different ICU settings. The use of the score to compare centres, for quality assessment and for clinical trials, is certainly justified. Although the predictive abilities of this method are very good, we are reluctant to recommend its use for individual patients over the entire range of possibilities. The performance of the model in patients with very high or very low probability of death, however, makes the score suitable for prognostication in such patients.



   Appendix. Calculation of customized SHARF II scores
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix. Calculation of...
 References
 
Original SHARF score [8]:

1a

1b
Age decade (agedec), albumin category (albcat) and prothrombin time category (PTcat) are expressed as categories, respiratory support (vent) and heart failure (CHF) are presented as absent (0) or present (1).

The result of the new multivariate analysis at T0 (diagnosis) was:

2a

2b
Bilirubin (bili) is expressed as mg/dL, sepsis (seps) and hypotension (hypo) are presented as absent (0) or present (1).

Substituting SHARF0 in equation 2a by equation 1a and algebraic transformation gives the following result:

3a

3b



   Acknowledgments
 
The authors thank A. Rombaut for excellent clinical monitoring of the study. The study was made possible by the commitment and cooperation of the following investigators who, in addition to the authors, collected the data in the participating centres: J. Berwaerts, General Hospital Middelheim, J. Bierens, General Hospital Stuivenberg, C. Claessens, St Jozef Hospital Turnhout, H. Demey, University Hospital Antwerp, I. Demeyer, O.L. Vrouw Hospital, E. J. Huygens, Free University Brussels, J. Nagler, General Hospital Middelheim, G. Nollet, O.L. Vrouw Hospital, P. Peeters, Free University Brussels, D. Van Caesbroeck, St Jozef Hospital Turnhout and P. Zachée, General Hospital Stuivenberg. The study was supported by the departmental fund F001 of the Department of Nephrology-Hypertension.

Conflict of interest statement. None declared.



   References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix. Calculation of...
 References
 

  1. Halstenberg WK, Goormastic M, Paganini EP. Validity of four models for predicting outcome in critically ill acute renal failure patients. Clin Nephrol 1997; 47: 81–86[ISI][Medline]
  2. Thadani R, Pascual M, Bonventre J. Acute renal failure. N Engl J Med 1996; 334: 1448–1460[Free Full Text]
  3. Liano F. Severity of acute renal failure; the need of measurement. In: Proceedings of the 3rd International Satellite Symposium on ARF, Halkidiki. 1993: 181–192
  4. Chew SL, Lins RL, Daelemans R et al. Outcome in acute renal failure. Nephrol Dial Transplant 1993; 8: 101–107[ISI][Medline]
  5. Wendon J, Smithies M, Sheppard M et al. Continuous arteriovenous high volume venous–venous hemofiltration in acute renal failure. Int Care Med 1989; 15: 358–363[ISI][Medline]
  6. Schaefer JH, Jochimsen F, Keller K et al. Outcome prediction of acute renal failure in medical intensive care. Int Care Med 1991; 17: 19–24[ISI][Medline]
  7. Lemeshow S, Klar J, Teres D et al. Mortality probability models for patients in the intensive care unit for 48 or 72 h; a prospective multicenter study. Crit Care Med 1994; 22: 1351–1358[ISI][Medline]
  8. Lins R, Elseviers M, Daelemans R et al. Prognostic value of a new scoring system for hospital mortality in acute renal failure. Clin Nephrol 2000; 53: 10–17[ISI]
  9. Knaus WA. Apache II: a severity of disease classification system. Crit Care Med 1985; 13: 818–829[ISI][Medline]
  10. Knaus WA, Draper EA, Wagner DP et al. Prognosis in acute organ system failure. Ann Surg 1985; 202: 685–693[ISI][Medline]
  11. Bone RC, Bamlk RA, Cerra FB et al. Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. Chest 1992; 101: 1644–1655[Abstract]
  12. Muckart DJ, Bhagwanjee S. American College of Chest Physicians/Society of Critical Care Medicine consensus conference definitions of the systemic inflammatory response syndrome and allied disorders in relation to critically injured patients. Crit Care Med 1997; 25: 1789–1795[ISI][Medline]
  13. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143: 29–36[Abstract]
  14. Le Gall J-R, Klar J, Lemeshow S et al. The logistic organ dysfunction system. J Am Med Assoc 1996; 276: 802–811[Abstract]
  15. Lemeshow S, Hosmer DW. A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 1982; 115: 92–106[Abstract]
  16. Jencks SF, Daley J, Draper D et al. Interpreting hospital data: the role of clinical risk adjustment. J Am Med Assoc 1988; 260: 3611–3616[Abstract]
  17. Liano F, Pascual J, Garcia-Martin F et al. Prognosis of acute tubular necrosis: an extended prospectively contrasted study. Nephron 1993; 63: 21–31[ISI][Medline]
  18. Cullen D, Chernow B. Predicting outcome in critically ill patients. Crit Care Med 1994; 22: 1345–1348[ISI][Medline]
  19. Moreno R, Miranda DR, Fidler V et al. Evaluation of two outcome prediction models on an independent database. Crit Care Med 1998; 26: 50–61[ISI][Medline]
  20. Douma CE, Redekop WK, Van Der Meulen JH et al. Predicting mortality in intensive care patients with acute renal failure treated with dialysis. J Am Soc Nephrol 1997; 8: 111–117[Abstract]
  21. Schuster DP. Predicting outcome after ICU admission. The art and science of assessing risk. Chest 1992; 102: 1861–1870[ISI][Medline]
  22. Braitman LE, Davidoff F. Predicting clinical states in individual patients. Ann Intern Med 1996; 125: 406–412[Abstract/Free Full Text]
  23. Marshall JC, Cook DJ, Christou NV et al. Multiple Organ Dysfunction Score: a reliable descriptor of a complex clinical outcome. Crit Care Med 1995; 23: 1638–1652[ISI][Medline]
  24. Vincent JL, Morenco R, Takala J et al. The SOFA (sepsis related organ failure assessment) score to describe organ dysfunction/failure. Intensive Care Med 1996; 22: 707–710[CrossRef][ISI][Medline]
Received for publication: 8.10.02
Accepted in revised form: 23. 4.04