1 Department of Nephrology-Hypertension, ACZA Campus Stuivenberg, Lange Beeldekensstraat 267, B-2060 Antwerpen, 2 Department of Nephrology-Hypertension, University Hospital Antwerp, Wilrijkstraat 10, B-2650 Edegem/Antwerpen, 3 Department of Nephrology-Hypertension, Sint Jozefziekenhuis, Steenweg op Merksplas 44, B-2300 Turnhout, 4 Department of Nephrology-Hypertension, O.L.-Vrouw Ziekenhuis, Moorselbaan 164, B-9300 Aalst, 5 Department of Intensive Care, General Hospital Middelheim, Lindendreef 1, B-2020 Antwerpen and 6 Department of Nephrology-Hypertension, Free University Brussels (VUB), Laarbeeklaan 101, B-1090 Brussels, Belgium
Correspondence and offprint requests to: Robert Lins, MD, PhD, Department of Nephrology-Hypertension, ACZA Campus Stuivenberg, Lange Beeldekensstraat 267, B-2060 Antwerpen, Belgium. Email: robert.lins{at}pro.tiscali.be
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods. The study involved eight intensive care units. We studied 293 consecutive patients with ARF in 6 months. Their mortality was 50.5%. The causes of ARF were medical in 184 (63%) patients and surgical in 108 (37%). In the latter group, 74 (69%) patients underwent cardiac and 19 (18%) vascular surgery.
Results. As the performance of the original SHARF scores was much lower in the multicentre study than in the original single-centre study, we re-analysed the multicentre data to customize the original model for the population studied. The independent variables were the score developed in the original study plus all additonal parameters that were significant on univariate analysis. The new multivariate analysis revealed an additional subset of three parameters for inclusion in the model (serum bilirubin, sepsis and hypotension). For the modified SHARF II score, r2 was 0.27 at 0 and 0.33 at 48 h, respectively, the receiver operating characteristic (ROC) values were 0.82 and 0.83, and the HosmerLemeshow goodness-of-fit P values were 0.19 and 0.05.
Conclusion. After customizing and by using two scoring moments, this prediction model for hospital mortality in ARF is useful in different settings for comparing groups of patients and centres, quality assessment and clinical trials. We do not recommend its use for individual patient prognosis.
Keywords: acute renal failure; intensive care unit; mortality prediction; prognosis; prospective multicentre study; severity score
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Acute renal failure (ARF) is a disease where risk modelling is important due to its high mortality rate of 50%, even higher in certain populations [2]. In general, however, scoring systems predict only the prognosis of a group of patients and are useful for research, but they are unlikely to be useful in the assessment of the prognosis of a given individual [3]. In these scoring systems, the parameters were obtained only once after hospitalization or recognition of ARF. Although it is important for intensive care unit (ICU) physicians and nephrologists to predict mortality in the first 24 h after hospitalization, according to some authors, relying on a single scoring point in time can be misleading [47].
In the developmental phase of our scoring system, a prospective, cohort, single-centre study was performed, involving 197 adult patients admitted consecutively to a medical ICU. Relevant parameters were documented at 0 and 48 h, in order to develop a score usable in different centres and for different causes of ARF [8]. Thus we developed a scoring protocol with two measuring points (Stuivenberg Hospital Acute Renal Failure, SHARF), one applied at the diagnosis of ARF and the other 48 h later. The same parameters were included in the score at the two time points, but were assigned different weights: age, albumin, prothrombin time, respiratory support and heart failure. Their r2 values were 0.36 and 0.44, respectively, which means that 36 and 44%, respectively, of the predicted mortality is explained by the model. The receiver operating characteristic (ROC) values at 0 and 48 h were 0.87 and 0.90 [8].
The model was next tested in the present prospective multicentre study in eight ICUs located in different medical and surgical settings and with different case mixes of patients.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
ARF was defined either as a serum creatinine of >2.0 mg/dl (176.8 µmol/l) in patients with previously normal kidney function, or in patients with known pre-existing mild to moderate renal disease with a >50% increase in creatinine above basal values. Patients with known chronic renal disease and a serum creatinine >3 mg% (266 µmol/l) before any acute deterioration or with clearly reduced kidney size on the first echography after admission did not qualify for the study. If renal function in patients with hydronephrosis recovered by >50%, the nature of their disease was considered acute.
The first set of data was collected on the first day that the criteria for the definition of ARF were met, the day of diagnosis (T0). For patients referred later in the course of their illness to the ICU, the day of admission to the ICU was taken as the starting day (T0). The second measurement point was 48 h later (T48).
The time of diagnosis of ARF in relation to hospital admission was defined as initial if the inclusion criteria were met at the time of admission to the ICU, and as delayed if the criteria were met during the ICU stay or if ARF developed when the patient was an in-patient on a regular ward before transfer to the ICU. The type of ARF, the cause and the primary disease setting were determined exactly as in the original single-centre study [8]. Also, the same clinical and laboratory parameters were documented at T0 and T48. The overall severity of illness was based on the APACHE II score [9], determined on the day when the first data collection was done for the score (i.e. T0). Relevant definitions of clinical parameters are given in Table 1. Organ failure was defined according to published criteria [1012].
|
From each of the eight participating institutions, every eligible patient was reported to the coordinating centre within 24 h after diagnosis, preferably using an established reporting form after it had been filled with admission data. After completion of the T48 data, the form was sent to the study monitor within 1 week after diagnosis. Corrections and additional information were requested when necessary. All laboratory data were given in the units of the local laboratory, and were recalculated to the units of the coordinating centre.
Statistical methods
All the prospectively collected parameters were tested for significance with univariate analysis using the Student t-test and the 2 test.
First, all recorded parameters were compared between centres, to evaluate the quality of data collection and differences in patient populations and interpretation of parameters.
Secondly, the performance of the original model with values for T0 and T48 (SHARF0 and SHARF48) [8]) was evaluated. The squared correlation coefficient (r2) of the linear regression analysis was used to test the explanatory power of the model. The areas under the ROC curves were used to judge the discrimination ability of our approach [13]. The degree of correspondence (fitting) between score and outcome was determined comparing the observed with the expected calculated mortality using the HosmerLemeshow goodness-of-fit test [15]. In this test, comparing the observed and predicted frequencies for up to 10 cells, a small P-value means that the predicted values do not fit the data.
Thirdly, since the performance of the model in this population clearly was weaker than its performance in the original population, we re-analysed the data in order to identify the best additional subset of variables to customize the original model for the new populations. The independent variables that were entered in the new multivariate analysis were the SHARF scores developed in the original single-centre study plus all other parameters that were significant on univariate analysis. Weighted scores were assigned to each parameter based on the regression coefficients of the linear regression analysis. Parameters with skewed distributions were divided into categories to neutralize the effect of outliers. Age was divided into decades to simplify the formulae for bedside calculation. The coefficients were multiplied by 100 and rounded to obtain whole numbers. The final adapted scores at T0 (SHARF II0) and at T48 (SHARF II48) were calculated by summing the weights associated with each parameter. In the linear model, the significant contributors to mortality were exactly the same as in the logistic model. The same had been observed in the original, single-centre study [8]. The probability of in-hospital mortality was calculated using the score as the single parameter in a logistic regression equation: logit = ß0 + ß1(score). The logit was then converted to a probability of hospital mortality as Pr(y = 1/logit) = elogit/(1 + elogit), where y is 1 for patients who died and 0 for patients who lived, Pr indicates probability and e indicates the base of the natural logarithm [14]. Risk ratios were calculated for the parameters found to be significant contributors. The adapted scores (SHARF II0 and SHARF II48) were again tested using the r2, ROC values and HosmerLemeshow statistics as had been done for the original scores.
Fourthly, the model was also tested in individual centres with at least 30 patients included in the study and in certain subpopulations: pre-renal causes vs acute tubular necrosis; medical vs surgical patients; dialysed vs non-dialysed patients; and initial vs delayed diagnosis of ARF. The standardized mortality ratio (SMR; ratio of the observed number of deaths vs that predicted by the model) for different ICUs and subpopulations [16] and the ROC values were calculated with their 95% confidence intervals (CIs).
Finally, the original and the newly designed scores were compared with the Acute Tubular Necrosis Severity Index of Liano [17] and the APACHE II score [9].
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
In univariate analysis, 20 other parameters were significantly different at T0 or T48 between survivors and non-survivors (Table 3). Using the original formulae, the r2 decreased to values <0.25 and the areas under the ROC curve to <0.80.
|
The final scoring formulae were:
SHARF II0 = 3.0 x age decade + 2.6 x serum albumin category T0 + 1.3 x prothrombin time category T0 + 16.8 x respiratory support T0 + 3.9 x heart failure T0 + 2.8 x serum bilirubin T0 + 27 x sepsis T0 + 21 x hypotension T0 17.
SHARF II48 = 3.9 x age decade + 3.3 x serum albumin category T0 + 1.7 x prothrombin time category T0 + 23.7 x respiratory support T48 + 8.8 x heart failure 48 h + 2.5 x serum bilirubin T48 + 24 x sepsis T48 + 17 x hypotension T0 28.
The data on the parameters included in the model are shown in Table 4. The calculation of the score is described in the Appendix. The mean values and 95% CIs of SHARF II scores were 36 (3340) for survivors and 63 (5967) for non-survivors at T0, and 32 (2836) and 67 (6172) at T48, respectively. The predicted probability of death for any possible score is given in Figure 1.
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the present multicentre study, the performance of the SHARF score developed in the original study deteriorated compared with the original single-centre results (ROC values of 0.67 and 0.78 compared with 0.87 and 0.90). Nevertheless, even these results compare favourably with those of similar efforts reported in the literature [1,20]. Douma et al. compared 11 mortality prediction models [20]. They found that, of all ARF scores, only the Liano model [17] had an ROC value of 0.78. Halstenberg et al., testing different models in their own study population, found a low performance [1]. In our series, the ROC values for the Liano model [17] and the APACHE II scores [9] were lower than for the SHARF scores. The goodness-of-fit was acceptable for the Liano scores but not for APACHE II. This lack of fit can be explained by the large number of cardiac surgery patients in our cohort (n = 74). As in the original APACHE II publication, these patients had better than expected prognoses and were not included in the analysis [9].
In an attempt to overcome these problems, we performed a new multivariate analysis of the multicentre data in order to identify the best additional subset of parameters that help to explain the mortality in this cohort. Accordingly, adding other parameters to those in the original scoring yielded clearly higher values for r2, the ROC and the HosmerLemeshow goodness-of-fit statistics. Because there were not enough patients to divide the subjects randomly into a developmental and a validation sample (the split-sample method), the method of subgroup analysis was used to test the new customized scoring [21]. Therefore, the same tests of calibration and discrimation were performed for individual centre populations and subpopulations according to types and settings of renal failure. For most of the centres and the subgroups, the discriminative powers, as expressed by ROC values, were comparable with the overall results. Additionally, the SMRs confirmed the good fit of the predicted mortality with the observed mortality in most centres and subgroups, but there was a large degree of uncertainty as expressed by the 95% CIs [22]
The question remains of whether or not other outcomes such as those related to long-term survival and quality of life issues, which probably are more relevant to individual patients, can be predicted by any model that has been developed up to now [20]. This can only be answered at a later stage. In our study, we confronted the SHARF II score with 1-year survival rate (manuscript in preparation).
The SHARF II score at T48 predicted mortality better than at diagnosis (T0), as had already been found in the developmental phase of the original model [8]. The discriminative power of the SHARF II in the individual centres and subgroups of this study confirms the improvement of its performance at T48. In general studies of outcomes in intensive care settings, some authors recently paid attention to repeated scores [23,24]. This novel approach of adding a second measuring point to ameliorate the accuracy of the prediction increases the possibility of using the score to make predictions for individual patients.
We conclude that our original scoring model, developed in our single-centre study, with two measuring points showed an acceptable performance in its multicentre trial. The premonitory value of T48 scores is as good as the best results published until now.
However, customizing the original scoring model for the multicentric population of this study makes this model for predicting hospital mortality in ARF useful in different ICU settings. The use of the score to compare centres, for quality assessment and for clinical trials, is certainly justified. Although the predictive abilities of this method are very good, we are reluctant to recommend its use for individual patients over the entire range of possibilities. The performance of the model in patients with very high or very low probability of death, however, makes the score suitable for prognostication in such patients.
![]() |
Appendix. Calculation of customized SHARF II scores |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() | 1a |
![]() | 1b |
The result of the new multivariate analysis at T0 (diagnosis) was:
![]() | 2a |
![]() | 2b |
Substituting SHARF0 in equation 2a by equation 1a and algebraic transformation gives the following result:
![]() | 3a |
![]() | 3b |
![]() |
Acknowledgments |
---|
Conflict of interest statement. None declared.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|