Biostatistics Group, School of Epidemiology & Health Sciences, University of Manchester, UK
Department of Medicine and Public Health, Section of Psychiatry, University of Verona, Italy
Correspondence: Professor Graham Dunn, Biostatistics Group, School of Epidemiology & Health Sciences, Stopford Building, Oxford Road, Manchester M13 9PT, UK. E-mail: g.dunn{at}man.ac.uk
![]() |
ABSTRACT |
---|
Aims To encourage both investigators of the variation in health care costs and the consumers of their investigations to think more critically about the precise aims of these investigations and the choice of statistical methods appropriate to achieve them.
Method We briefly describe examples of regression models that might be of use in the prediction of mental health costs and how one might choose which one to use for a particular research project.
Conclusions If the investigators are primarily interested in explanatory mechanisms then they should seriously consider generalised linear models (but with careful attention being paid to the appropriate error distribution).Further insight is likely to be gained through the use of two-part models.For prediction we recommend regression on raw costs using ordinary least-square methods.Whatever method is used, investigators should consider how robust their methods might be to incorrect distributional assumptions (particularly in small samples) and they should not automatically assume that methods such as bootstrapping will allow them to ignore these problems.
![]() |
INTRODUCTION |
---|
Background
The purpose of this review is to enhance readers ability to
understand and appraise research papers and other reports on the prediction of
mental health care costs, paying particular attention to the statistical
methodology, in terms of choice of model, and to evaluation of the likely
future performance of the chosen predictive model. Although we would not
expect the typical reader of this journal to be fully aware of the technical
pitfalls of analysing costs data, in our view it is vital that, as in the
critical appraisal of other research evidence, readers are familiar with the
main issues and how the authors interpretations of the results of such
studies might be misleading or mistaken. Whenever possible, we wish to be able
to make our own judgements as to the quality of a piece of research rather
than having to take the views of experts on trust. Other topics,
such as methods of patient selection and methodological problems concerning
measurement of the actual costs of care for individual patients, are extremely
important but we will not attempt to discuss these in detail here. Many of the
problems concerning the selection of patients to study are similar to those
that are the usual concerns of anyone wishing to make a critical appraisal of
prognosis studies and we therefore refer readers to the relevant literature in
this area (Sackett et al,
1991).
Our own interest in the appraisal of the validity of many past studies of health care costs and a recent review by Diehr et al (1999) have prompted us to question whether the methods currently available for modelling or predicting health care costs, other than ordinary least-squares regression of logged costs, are widely known in the mental health field. We are not aware of an elementary discussion of the relevant methodologies but there is a useful study illustrating most of the methodological problems in the context of analysis of variation in mental health care costs (Kilian et al, 2002). Although it covers most of the same ground as the present paper, the discussion by Kilian et al is technically more difficult than the one presented here. The goal of this review is to make these methods more widely accessible to non-specialists and, in particular, to the consumers of the resulting research findings.
Our intention is to describe and explain the competing methods as clearly as possible while keeping the technical details to the minimum necessary for this objective. We will use little mathematics, restricting most of it to the definition of the various indices of the predictive power of the competing models. We hope that the present review can be read and understood by clinicians and other mental health workers, although we would hope that it might also provide a good starting point for statisticians and health economists who do not have experience or specialist knowledge of econometric modelling.
![]() |
DEFINING THE GOALS OF THE STUDY |
---|
To summarise, goals need to be defined precisely and the statistical methods should be chosen to fulfil these goals. A model and corresponding fitting method might be optimal for one particular goal but not the most effective for another. The optimal choice of methodology should be dependent upon the authors chosen (and explicitly stated) aims. A given statistical model might be good as an explanatory device but poor as a tool for forecasting, or vice versa. In practice, however, the choice of statistical model might not matter too much (i.e. the results of the analysis are fairly insensitive or robust to model choice) but, again, both the authors and readers of studies of health costs need to know whether this is likely to be the case.
![]() |
DISTRIBUTIONAL PROPERTIES OF COST DATA |
---|
A given population (or sample) of patients can often be thought of as a mixture of two types. First, there are those who will incur little, if any, treatment costs: those that attend for assessment, advice or brief support but do not need access to long-term care. They may have only a very minor problem or one that is acute but from which they make a quick and full recovery. Second, there are patients who need varying but non-trivial amounts of treatment and long-term care. These are the patients who may incur quite modest yearly health care costs but need very expensive long-term care and support. Thus, the first question faced by the statistical modeller, whether interested in explanation or forecasting, is whether to try to take this heterogeneity of the patients (i.e. the group structure) into account. Do we use a one-part model or is it better to use a two-part model? Before trying to answer this question we first need to describe what the two types of model are. We also need a more general discussion on the choice of regression models.
![]() |
ONE- OR TWO-PART MODELS? |
---|
![]() |
In a one-part model we use a single regression equation to model the costs for everyone in the data-set (i.e. we do not first separate groups A and B). The predicted cost for a given patient with characteristics X is then simply E(Cost|X).
We will assume that the investigator has a clear idea of how to distinguish substantial from little or nothing costs based on his or her knowledge of the population being sampled. But what if it is not at all obvious what the boundary between the two groups might be? What if we are convinced that the population is made up of the two groups A and B but have difficulty assigning group membership to many of the individual patients? It may not be at all clear what the cost cut-off should be in order to discriminate between the two. In this case we might wish to postulate a more subtle version of a two-part model in which group membership remains latent or hidden. This type of model is called a latent class or finite mixture model in the statistical literature. We do not pursue this idea further here but refer the interested reader to Deb & Holmes (2002) for an illustrative example and methodological discussion.
The two-part model (or possibly a model with more than two parts; see Duan et al, 1983) is conceptually much richer than the simpler one-part model. For this reason it is likely to provide more insight concerning the ways in which costs arise. Diehr et al (1999) comment: We will defer discussion on accuracy of forecasts until later. Before moving on, however, it should be noted that an intelligent data analyst is likely to make a decision concerning the use of a one-part or two-part model at least partly on the basis of his or her prior knowledge concerning the heterogeneity of the population of patients under study and also from the way the sample of patients for analysis has been chosen. The analyst may have deliberately selected a relatively homogeneous subsample of patients prior to any further statistical analyses.
When the goal is understanding the system, a two-part model seems best because it permits the investigator to distinguish factors that affect the propensity to use any services from factors that affect volume of utilisation once the person has entered the system... For understanding the effect of individual covariates on total costs, a one-part model is most useful because it generates a single regression coefficient for each variable and so can be interpreted easily.
Having chosen which of the two approaches to use, we are still faced with the problem of how to choose an appropriate regression model for either total costs (one-part model) or costs in those that enter the system (two-part model). This is the subject of the following section. Readers wishing to read more on two-part modelling are referred to Duan et al (1983, 1984), Mullahy (1998) and the review of Diehr et al (1999).
![]() |
CHOICE OF REGRESSION MODEL |
---|
If one takes logarithms of the observed cost data, this transformation usually will have two consequences: a considerable reduction in the skewness of the data, although complete symmetry is unlikely to be achieved in practice; and stability of the variance (i.e. the variability of the observed costs will not increase with their mean). Both of these consequences lead to better performance of ordinary least-squares regression methods. Examples of the use of this approach can be found in Amaddeo et al (1998) and Bonizzato et al (2000). The method is (usually) implicitly based on a multiplicative model for the actual costs (including a multiplicative error term). There is a problem if there are observed costs of zero (the logarithm of zero is undefined) but this is often remedied by adding a small constant (unity, for example) prior to the logarithmic transformation. The method seems to work satisfactorily in practice but one should always remember that the aim of the analysis is to evaluate our ability to predict actual costs and not their logarithms. Values of R2 and other indices of concordance of observed and predicted values (see below) must be evaluated using the observed and predicted costs (not their logarithms). More importantly, investigators should be aware of the fact that, even though ordinary least-squares methods produce unbiased estimates of log-costs, the predicted actual costs (and also total costs derived from the individual predictions) will be biased. They will underestimate the true cost. However, bias-reduction methods are available (e.g. the non-parametric method called smearing; see Duan, 1983) so this underestimation is not a serious problem as long as it is recognised by the investigator.
If the investigator really believes that the relationship between the
predictive factors and cost is multiplicative, then it is probably preferable
to model this explicitly using an appropriate generalised linear model. In a
generalised linear model, the familiar regression equation of the form
+
ßixi is called the linear
predictor. But the linear predictor is not necessarily equated with the
expected cost, as in multiple regression with the raw data, but via
link function. So, for example, we could have a model in which
the natural logarithm of the expected costs is equated with the linear
predictor
![]() |
![]() |
![]() |
One very natural extension of the above log-linear generalised linear model
is through the use of an offset. Suppose that each patient
provided cost data for a different number of years (let this variable be
called Years). Instead of modelling total costs, suppose that we
were also interested in modelling costs per year
![]() |
![]() |
![]() |
ASSESSING THE MODELS PERFORMANCE |
---|
Perhaps the simplest index is the familiar Pearson product-moment
correlation (R) between predicted and observed costs
(Zheng & Agresti, 2000), but this is far from ideal. It is a measure of association rather than
concordance and it is probably better to use Lins concordance
coefficient (Rc;
Lin, 1989) or an intraclass
correlation (Ri;
Dunn, 1989). But both of these
indices, as well as the product-moment correlation, are dependent on patient
heterogeneity they will increase with increases in the variability of
the costs, irrespective of the accuracy of the predictions. Perhaps the most
commonly used index for a multiple regression model is the coefficient
of determination or proportion of variance explained,
R2 (equivalent in this situation to the square of the
product-moment correlation between prediction and observation) usually
obtained from the analysis of variance table. But, again, this is not
particularly useful unless the aim is to discriminate between patients. Like
the above correlations, it is dependent on the heterogeneity of the observed
costs. Despite this potential disadvantage, however, they are obviously useful
for comparison of the performance of various models for the same data.
Problems only arise when we try to compare the performance of predictive
models on different groups. Some authors prefer to use what is called the
adjusted R2, R2a,
where
![]() |
The value of this statistic for the latter purpose is, in our opinion, not high; R2a might be useful as an initial gross indicator, but this is all
The accuracy of a models predictions is probably best evaluated by a function of the differences between the predicted and observed costs. That is, by a function of (co cp), where co is the observed cost for a given patient and cp is the corresponding prediction: E(Cost|X). The three obvious choices are the residual mean square (RMS), root-mean-square error (RMSE) and the mean of the absolute error (MAE). A less familiar index is Theils U-statistic (Theil, 1966; Greene, 2000).
The RMSE is the square root of the mean of the squared differences between
the predicted and observed values of cost, MAE is the mean of the absolute
value of the differences, and RMS is the residual sum of squares divided by
the residual degrees of freedom as obtained from the relevant analysis of
variance table. The square root of the RMS (i.e. the standard deviation of the
residuals) is likely to be close but not identical to the RMSE. Theils
U-statistic is the square root of the sum of the squared deviations
of the predicted from the observed costs divided by the square root of the sum
of the squared predictions. Algebraically, the less familiar of these indices
are defined as follows
![]() |
![]() |
One potential problem, whatever indicator of performance is used, is that if it is used naively it is likely to be overoptimistic. If the explanatory variables in the final model have been chosen using the same data as those used to assess the models performance, then we are likely to have capitalised on chance associations between potential explanatory variables and the cost outcomes and inevitably will have produced a model that has been overfitted (Greene, 2000). A more realistic evaluation of the performance of the model ideally should be made by cross-validation using a data-set collected from a second, independent sample of patients. Unfortunately, however, we often do not have adequate resources within a particular research project to be able to collect such a data-set, and if we test our model on someone elses data it is unlikely that they will have collected exactly the same information using the same measurement procedures on a comparable sample of patients. A more realistic option is to split our original sample into two, develop the model on one of the subsamples (the so-called training set) and evaluate it using the second one (the validation set). This split-sample or internal approach to cross-validation is the one advocated by Diehr et al (1999) and illustrated in Kilian et al (2002).
One pitfall of the split-half approach is its inefficient use of the data. Unless we have a very large sample to start with, we are usually loath to use only half of the patients to develop the model and half to test it. Ideally, we would like to maximise the use of the data for both functions. One approach is to take the full sample of n patients and leave each of the patients out in turn. Each time, we derive a model from the n 1 remaining patients and test its performance on the one that has been left out. This leave-one-out procedure in principle involves a separate analyses from which we can then produce an overall summary of the models performance. In practice this will not be necessary, but the technical details are beyond the scope of the present discussion. The text by Mosteller & Tukey (1977) contains a nice introduction to cross-validation methods and Armitage et al (2002: p. 395) provides a brief discussion of variants of the leave-one-out method (see also Picard & Berk, 1990).
![]() |
HOW ROBUST ARE THE STATISTICAL METHODS? |
---|
such bootstrap techniques can be recommended either as a check on the robustness of standard parametric methods, or to provide the primary statistical analysis when making inferences about arithmetic means for moderately sized samples of highly skewed data such as costs (Barber & Thompson, 2000b).
Barber & Thompsons claims concerning the robustness of the inferences based on the bootstrap have been challenged recently by OHagan & Stevens (2003). They point out that for highly skewed cost data obtained from small samples of patients the sample mean is not the ideal estimator of the required population mean. It is very sensitive to the presence of one or two stragglers with relatively high costs, and inferences based on bootstrapping the sample mean will be equally affected by this problem. They argue that even when the methods advocated by Barber & Thompson are technically valid (in terms of their large sample properties), in small samples they may lead to inefficient and even misleading inferences. We suspect that this is likely to be an even greater problem for ordinary least-squares-based multiple regression models. OHagan & Stevens agree with Barber & Thompsons assertion that we should be concentrating on inferences on untransformed costs (as do we in the present paper), but their main message is that it is important to apply statistical methods (in the present context, model-fitting procedures) that recognise the skewness in cost data.
OHagan & Stevens (2003) advocate parametric modelling with realistic error structures. This does not, however, rule out the use of bootstrapping. Having chosen the model-fitting procedure to cope with the distributional characteristics of the data, we can use bootstrapping to obtain standard errors, confidence intervals, etc. OHagan & Stevens pursue Bayesian methods, but a viable alternative might be the use of robust model-fitting procedures. These are methods that are not unduly influenced by outlying or extreme observations (Mosteller & Tukey, 1977; Berk, 1990). Note that robust fitting methods should not be confused with robust methods of standard error estimation (the bootstrap, for example) once we have got our best-fitting model. They are complementary and should not be seen as competitors. A recent health economics application of robust model-fitting methodology can be found in Hoch et al (2002).
![]() |
DISCUSSION |
---|
Assessing the performance of the model
We do not recommend the use of standardised indices such as
R2 or Theils U-statistic to compare the
performance of a model when applied to different groups. The apparent
lack of predictive value for patients in one particular group (group 1), for
example, as opposed to that in another (group 2) may simply be a statistical
artefact caused by the fact that there is less variability in the costs for
the patients in group 2. The performance of the forecasts (as measured by
root-mean-square error or mean absolute error) may, in fact, be better in
group 2 than in group 1. The main advantage of R2 and
Theils U-statistic is to compare the performance of competing
models within the same group of patients. For comparison of the
performance of models on different groups, we recommend the use of the
root-mean-square error or mean absolute error. Finally, we stress the
importance of cross-validation how well will the model perform in a
future sample?
![]() |
Clinical Implications and Limitations |
---|
LIMITATIONS
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
REFERENCES |
---|
Armitage, P., Berry, G. & Matthews, J. N. S. (2002) Statistical Methods in Medical Research (4th edn). Oxford: Blackwell.
Barber, J. & Thompson, S. G. (2000a) Analysis and interpretation of cost data in randomised controlled trials: review of published studies. BMJ, 317, 1195 1200.
Barber, J. & Thompson, S. G. (2000b) Analysis of cost data in randomized trials: an application of the non-parametric bootstrap. Statistics in Medicine, 19, 3219 3236.[CrossRef][Medline]
Berk, R. A. (1990) A primer in robust regression. In Modern Methods of Data Analysis (eds J. Fox & J. S. Long), pp. 292324. Newbury Park, CA: Sage Publications.
Bonizzato, P., Bisoffi, G., Amaddeo, F., et al (2000) Community-based mental health care: to what extent are service costs associated with clinical, social and service history variables? Psychological Medicine, 30, 1205 1215.[CrossRef][Medline]
Byford, S., Barber, J. A., Fiander, M., et al
(2001) Factors that influence the cost of caring for patients
with severe psychotic illness. Report from the UK 700 trial.
British Journal of Psychiatry,
178, 441
447.
Chisholm, D. & Knapp, M. (2002) The economics of schizophrenia care in Europe: the EPSILON study. Epidemiologia e Psichiatria Sociale, 11, 12 17.[Medline]
Deb, P. & Holmes, M. (2002) Estimates of use and costs of behavioural health care: a comparison of standard and finite mixture models. In Econometric Analysis of Health Data (eds A. M. Jones & O.ODonnell), pp. 87 99. New York: John Wiley.
Desgagné, A., Castilloux, A.-M., Angers, J. F., et al (1998) The use of the bootstrap statistical method for the pharmacoeconomic cost analysis of skewed data. Pharmacoeconomics, 13, 487 497.[Medline]
Diehr, P., Yanez, D., Ash, A., et al (1999) Methods for analyzing health care utilisation and costs. Annual Review of Public Health, 20, 125 144.[CrossRef][Medline]
Draper, N. R. & Smith, H. (1998) Applied Regression Analysis (3rd edn). New York: John Wiley.
Duan, N. (1983) Smearing estimate: a non-parametric retransformation method. Journal of the American Statistical Association, 78, 605 610.
Duan, N., Manning, W.G., Morris, C.N., et al (1983) A comparison of alternative models for demand for medical care. Journal of Business and Economic Statistics, 1, 115 126.
Duan, N., Manning, W.G., Morris, C.N., et al (1984) Choosing between the sample selection model and the multi-part model. Journal of Business and Economic Statistics, 2, 283 289.
Dunn, G. (1989) Design and Analysis of Reliability Studies. London: Edward Arnold.
Efron, B. & Tibshirani, R. J. (1993) An Introduction to the Bootstrap. London: Chapman & Hall.
Everitt, B. S. & Dunn, G. (2001) Applied Multivariate Data Analysis (2nd edn). London: Edward Arnold.
Greene, W. H. (2000) Economic Analysis (4th edn). Englewood Cliffs, NJ: Prentice Hall.
Hoch, J. S., Briggs, A.H. & Willan, A. (2002) Something old, something new, something borrowed, something blue: a framework for the marriage of health econometrics and cost-effectiveness analysis. Health Economics, 11, 415 430.[CrossRef][Medline]
Jones, A. M. & ODonnell, O. (2002) Econometric Analysis of Health Data. New York:John Wiley.
Kennedy, P. (1998) A Guide to Econometrics (4th edn). Oxford: Blackwell Publishers.
Kilian, R., Matschinger, H., Löffler, W., et al (2002) A comparison of methods to handle skew cost variables in the analysis of the resource consumption in schizophrenia treatment. Journal of Mental Health Policy and Economics, 5, 21 31.[Medline]
Knapp, M., Chisolm, D., Leese, M., et al (2003) Comparing patterns and costs of schizophrenia in five European countries: The EPSILON Study. Acta Psychiatrica Scandinavica, 105, 42 54.[CrossRef]
Lin, L. I.-K. (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255 268.[Medline]
Mosteller, F. & Tukey, J. W. (1977) Data Analysis and Regression. Reading, MA: Addison Wesley.
Mullahy, J. (1998) Much ado about two: reconsidering retransformation and the two-part model in health econometrics. Journal of Health Economics, 17, 247 281.[CrossRef][Medline]
OHagan, A. & Stevens, J.W. (2003) Assessing and comparing costs: how robust are the bootstrap and methods based on asymptotic normality? Health Economics, 12, 33 49.[CrossRef][Medline]
Picard, R. R. & Berk, K.N. (1990) Data splitting. American Statistician, 44, 140 147.
Sackett, D. L., Haynes, R. B., Guyatt, G. H., et al (1991) Clinical Epidemiology: A Basic Science for Clinical Medicine. Boston, MA: Little Brown.
Stinnett, A. A. & Mullahy, J. (1998) Net health benefits: a new framework for the analysis of uncertainty in cost-effectiveness analysis. Medical Decision Making, 18 (Pharmacoeconomics special issue), S68 S80.[Medline]
Tambour, M., Zethraeas, N. & Johannesson, M. A. (1998) A note on confidence intervals in cost-effectiveness analysis. International Journal of Technology Assessment in Health Care, 14, 467 471.[Medline]
Theil, H. (1966) Applied Economic Forecasting. Amsterdam: North Holland.
Verbeek, M. (2000) A Guide to Modern Econometrics. New York: John Wiley.
Wooldridge, J. M. (2003) Introductory Econometrics (2nd edn). Madison, OH: South-Western College Publishing.
Zheng, B. & Agresti, A. (2000) Summarising the predictive power of a generalised linear model. Statistics in Medicine, 19, 1771 1781.[CrossRef][Medline]
Received for publication July 4, 2002. Revision received May 15, 2003. Accepted for publication June 3, 2003.