Psychological Therapies Research Centre, University of Leeds, Leeds, UK
Correspondence: Professor Michael Barkham, Psychological Therapies Research Centre, 17 Blenheim Terrace, University of Leeds, Leeds LS2 9JT, UK. E-mail: m.barkham{at}leeds.ac.uk
Declaration of interest M.B. received funding from the Mental Health Foundation and the Artemis Trust to support the development of the CORE-OM and CORE-A, respectively.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Aims To test the suitability and utility of the Clinical Outcomes in Routine Evaluation - Outcome Measure (CORE-OM) and CORE-Assessment (CORE-A) assessment tools.
Method Service intake data were analysed from counselling and psychological therapy services in 32 primary care settings and 17 secondary care settings.
Results Completion rates exceeded 98% in both of the settings sampled. Intake severity levels were similar but secondary care patients were more likely to score above the risk cut-off and the severe threshold and to have experienced their problems for a greater duration.
Conclusions The CORE-OM and CORE-A are suitable assessment tools that show small but logical differences between psychological therapy services in primary- and secondary-based care.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
METHOD |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Patient samples
Patients not completing the CORE-OM or missing more than three items from
the 34-item measure were excluded from the mean score calculations. Using
these criteria, 5733 primary care patients and 1918 secondary care patients
were selected for inclusion. Table
1 presents demographic information for the two patient
samples.
|
Measures
Patient-completed measure: CORE-OM
The CORE-OM comprises 34 items addressing domains of subjective well-being
(4 items), symptoms (12 items), functioning (12 items) and risk (6 items; 4
risk to self items and 2 risk to others items).
Within the symptom domain item clusters address anxiety (4
items), depression (4 items), physical problems (2 items) and trauma (2
items). The functioning domain item clusters address general functioning (4
items), close relationships (4 items) and social relationships (4 items).
Items are scored on a five-point scale from 0 (not at all) to 4 (all the time). Half of the items focus on low-intensity problems (e.g. I feel anxious/nervous) and half focus on high-intensity problems (e.g. I feel panic/terror). Eight items are keyed positively.
All services in the study asked patients to complete the CORE-OM as a measure of distress at intake (i.e. before any intervention). In practice, the CORE-OM was completed during screening or assessment by 73.8% of primary care patients and 87.3% of secondary care patients, and completed at the first therapy session by the remaining 26.2% in primary care and 12.7% in secondary care.
Practitioner-completed measure: CORE-A
The CORE-A enables the collection of referral information, demographics,
assessment, outcome, and data on presenting problem severity and duration. The
CORE-A lists the following 14 problems: depression, anxiety, psychosis,
personality problems, cognitive/learning difficulties, eating disorder,
physical problems, addictions, trauma/abuse, bereavement, self-esteem,
interpersonal problems, living/welfare and work/academic. At initial
assessment, practitioners recorded the presence or absence of these problems
and rated the severity of each presenting problem on a scale from 1
(minimal) to 4 (severe). The duration of problems
was recorded under four categories: <6 months, 6-12 months, >12 months
or recurring/continuous.
Data analysis
All data were scanned optically using FOR-MIC software (Formic Design and
Automatic Data Capture, 1996). Statistical analyses were carried out using the
Statistical Package for the Social Sciences for Windows (version 11). The
CORE-OM overall mean scores and non-risk scores were calculated using
pro-rating, where up to three items were missed (i.e. if two
items were not completed, the total score would be divided by 32 rather than
34). Domain mean scores were not pro-rated if more than one item
was missing from that domain.
Completion rates (n clients completing the CORE-OM) and missing items were analysed using the full data-set (n=6610 primary care and n=2311 secondary care). All subsequent analyses were carried out on the samples of patients completing the CORE-OM and fulfilling the criteria for pro-rating (n=5733 primary care and n=1918 secondary care).
Internal consistency of the CORE-OM was calculated using Cronbachs
coefficient (Cronbach,
1951). Statistical power was high due to the large sample sizes,
therefore differences in mean scores between samples are reported using
confidence intervals (Gardner & Altman,
1986) and effect sizes (Cohen,
1988) rather than significance tests. An effect size
represents a standard deviation unit and is calculated as the difference
between means divided by the pooled standard deviation. The standard guide to
the effect size differences denotes three bands: 0.2 (small) 0.5 (medium) and
0.8 (large). On the basis of Cohen
(1988), noting that a 0.2
effect size involved an 85% overlap between distributions, it has been
suggested that an effect size of 0.4 (involving a 73% overlap) be used as the
criterion for clinically meaningful differences
(Elliott et al, 1993).
Chisquared analysis was used to test proportional differences between samples
(e.g. demographic characteristics).
To facilitate comparisons regarding the range of severity, we applied two
cut-offs to the CORE-OM data that reflected differing levels of severity (for
details, see Jacobson & Truax,
1991). The first cut-off on the CORE-OM, termed
clinical, was defined as a score of 1.19 for men and 1.29 for
women and was derived from calculating the CORE-OM score that would best
demarcate membership of the general population (i.e. a lower score) or a
clinical population (i.e. a higher score) using the following formula (see
Evans et al, 2002):
![]() |
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The most commonly missed item in both primary and secondary settings was no. 19 (I have felt warmth and affection for someone). The overall item omission rates were 0.9% (95% CI 0.7-1.2%) for primary care and 0.8% (95% CI 0.5-1.3%) for secondary care. In the primary care sample, five items had missing cases above the upper threshold (1.2%) of the 95% confidence interval. In the secondary care sample, two items had missing cases above the threshold (1.3%). Table 2 summarises the items above the threshold in each sample.
|
Internal consistency
We used Cronbachs coefficient to calculate the internal
reliability of the CORE-OM domains and item clusters within domains for both
primary and secondary care settings. Although the item clusters were
originally selected to represent the range of patient experience and not
intended to be used as sub-scales, we calculated
values for them in
order to test the robustness of the components within each domain. The
value indicates the proportion of covariance between items.
Table 3 illustrates that all
domains showed good internal reliability, with
>0.70 and <0.97
in each setting. In both primary and secondary care, the well-being domain had
the lowest internal consistency. Values of
exceeded 0.70 for six of
the nine item clusters - anxiety, depression, trauma, general functioning,
social relationships, and risk to self - whereas for close relationships
was 0.65-0.70. Only for physical problems and risk to others (both of
which comprised just two items) was
<0.60.
|
The CORE-OM profile of severity of problems
Overall scores
To compare the overall CORE-OM scores in primary and secondary care
settings, we generated notched boxplots and histograms presenting the
distribution of CORE-OM mean scores for all items (see Figs
1 and
2). In terms of overall mean
scores, the two settings showed a strikingly similar distribution.
Figure 1 shows that there were
four outliers in the primary care sample scoring above the maximum secondary
care score of 3.65 and no patient in either setting scored 4. As illustrated
in Fig. 2, the distributions
are near symmetrical although different in total frequency as a result of the
different n in each sample.
|
|
Domain scores
We calculated mean scores for each domain to determine whether patients in
primary and secondary care settings showed a different profile of scores.
Table 4 presents CORE-OM scores
by domain for the two service settings, together with effect sizes indicating
the degree of difference between populations. Although all effect size
differences were small (i.e. appreciably below 0.20), secondary
patients did report higher levels of risk (effect size -0.15). The well-being
domain showed the opposite trend, with primary care patients reporting poorer
subjective well-being than secondary care patients (effect size 0.08).
|
Item scores
We analysed the mean scores for each of the 34 CORE-OM items across the two
service settings to establish whether any items appeared to function
differently in these patient groups. Comparison of the mean item scores using
Cohens effect size methodology indicated that secondary care patients
scored higher than primary care patients on all four risk to
self items: item 9 I have thought of hurting myself
(effect size -0.14), item 16 I have made plans to end my life
(effect size -0.12), item 24 I have thought it would be better if I
were dead (effect size -0.14) and item 34 I have hurt myself
physically or taken dangerous risks with my health (effect size -0.15).
There was no difference between primary and secondary care patients on the two
risk to others items: item 6 I have been physically
violent to others (effect size 0.00) and item 22 I have
threatened or intimidated another person (effect size -0.03). Primary
care patients scored higher than secondary patients on item 14 I have
felt like crying (effect size 0.22) and item 18 I have had
difficulty getting to sleep or staying asleep (effect size 0.13).
Application of clinical cut offs
We applied the two cut-off thresholds to the data and
Table 5 presents the proportion
of patients in each setting above or equal to the cut-off thresholds.
Chi-squared tests showed that a significantly higher proportion of primary
care patients than secondary care patients were above the clinical cut-off for
the well-being domain and non-risk items (P<0.01). However, as
noted in the methodology, the statistical power of the data-set was high due
to the large n, increasing the likelihood of statistical significance
for small differences. Odds ratio (OR) analysis showed that secondary care
patients were only marginally less likely to be above these cut-offs (OR=0.84
for well-being; OR=0.85 for non-risk items). Secondary care patients were more
likely than primary care patients to be above the risk cut-off (OR=1.23, CI
1.10-1.36) and more likely to be above the severe threshold
(OR=1.34, CI 1.17-1.53).
|
Patient-rated CORE-OM severity and presenting problems
We used the practitioner rating provided on the CORE-A form to determine
patients' presenting problems. We classified each problem as present if given
any rating by the practitioner from 1 (minimal) to 4
(severe) and absent if no rating was given.
Table 6 presents the mean
CORE-OM scores for patients grouped by presenting problem. Groups were not
independent because many patients were rated as presenting with more than one
problem.
|
The effect size analysis in Table 6 shows that CORE-OM risk scores were a key factor in differentiating secondary care patients from primary care patients across the presenting problems. Secondary care patients had higher risk scores than primary care patients for all presenting problems, except addictions where both primary and secondary patients had relatively high mean risk scores. For patients with psychosis, personality problems and eating disorders (problems traditionally seen in specialist services), risk scores were substantially higher in secondary than in primary care (effect size >0.3). In addition, patients with psychosis, eating disorders and living/welfare problems also showed higher non-risk scores in secondary care than in primary care (i.e. higher levels of overall distress; effect size >0.1).
Practitioner-rated CORE-A profile of severity of presenting problems
We used the CORE-A data to compare the practitioner-rated severity and
duration of problems experienced in primary and secondary care settings.
Table 7 presents the mean
practitioner rating of the severity of the presenting problems. Effect size
analysis in Table 7 shows that
practitioners rated the severity of anxiety and bereavement higher in primary
care than in secondary care settings (effect size >0.2), but the severity
of personality problems, cognitive difficulties, eating disorder and physical
problems was rated as higher in secondary care than in primary care settings
(effect size >0.2). We were mindful that such differences could reflect
differential anchor points in terms of perceptions of problems between
practitioners from primary and secondary care settings. Accordingly, we
sampled two ranges of CORE-OM scores - a lower range (CORE-OM range 1.00-1.60)
and a higher range (CORE-OM range 2.20-2.80) - to check that there were no
meaningful differences between primary and secondary care practitioners'
ratings within these ranges. The mean effect size (low and high range) between
primary and secondary care practitioners' ratings for each presenting problem
fell below the 0.4 effect size criterion.
Table 8 presents the mean
rating of duration of the presenting problems in primary and secondary care
settings. For all the presenting problems, secondary care patients were rated
as having experienced the problem for a greater duration than primary care
patients (all effect sizes >-0.2). The greatest difference in problem
duration was for psychosis (effect size -0.7).
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Suitability
In relation to the appropriateness of these tools in different service
settings, the findings show that CORE-OM is acceptable to clients in both
settings (as evidenced by high completion rates) and is robust in its
structure across different settings (as evidenced by high internal
reliabilities), even to the extent of most of the item clusters. However, it
is acknowledged that this evidence pertains to counselling and psychological
therapy services and could differ in other service settings. In addition, a
minority of patients completed their measures at their first session rather
than at screening or intake assessment. However, the realities of routine
practice settings probably demand reasonable flexibility in the pursuit of
maximising compliance in completing the assessment measures.
In administering the same measure in both primary and secondary settings, it might be presumed that the CORE-OM would generate a ceiling effect in secondary care services. We found no evidence of this in the data that we examined. However, we distinguish clearly between patients seen in out-patient settings within secondary care services (as reported here) and patients deemed to be within a category that has been referred to as serious and enduring mental illness. For such patients, the process of understanding and completing a self-report measure might yield results that are not necessarily continuous with those reported here (e.g. they might underscore rather than produce logically higher scores). However, Whewell & Bonanno (2000) reported that the risk sub-scale was clinically valid in the CORE-A and CORE-OM scores matched for patients with borderline personality disorder. Where CORE-OM scores might not be considered safe, the CORE-A form completed by the practitioner would be the sole source of information.
Utility
Although we found general heterogeneity between primary and secondary care
settings in self-rating on the CORE-OM, there was clear evidence that the
CORE-OM discriminated between patients in secondary and primary care by
showing them to be more likely to score higher on risk and be above the severe
threshold. These two components support the ability of the CORE-OM to
discriminate appropriately between service settings, a finding supported by
the practitioners' consistent reporting of greater duration of patients'
presenting problems in secondary care. These findings may provide an
additional tool in the recognition by healthcare professionals of those
patients potentially at risk of suicide (e.g.
Gunnell & Harbord,
2003).
Our data showed primary care patients to be characterised by more acute problems (i.e. problems that received a lower duration rating). The self-severity rating may be related to the acute nature of the problems. Item analysis showed this with higher ratings on the item felt like crying, which is likely to reflect the immediacy of the problems experienced. In contrast, secondary care patients were characterised by more chronic problems (i.e. of higher duration) and higher risk scores on the CORE-OM. This agrees with the therapist-rated chronicity of problems in practice settings of counselling and clinical psychology (Cape & Parham, 2001). This profile of patients in secondary services appears to be a logical consequence of referral procedures and waiting times. However, we are mindful that practitioners in primary and secondary care settings may have differential anchor points in their evaluation of the severity of the presenting problems. When we controlled for patient-rated severity, we still found at least a 75% overlap in the distributions of primary- and secondary-based practitioners' ratings. Notwithstanding this overlap, our view on this is that practitioner ratings will be influenced by a myriad of professional and contextual factors that will require further research to ensure standard use in routine settings.
The use of both patient- and practitioner-completed assessment forms marks a step forward from reliance on either patient perception alone or established assessment packages using practitioner ratings alone (e.g. Health of the Nation Outcome Scales; Wing et al, 1998). The use of such data provides a logical base for benchmarking service delivery systems (e.g. Barkham et al, 2001) and adds to a developing literature (e.g. Slade et al, 2001) providing low-cost but reliable measures that can be adopted routinely in mental health settings.
![]() |
Clinical Implications and Limitations |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
LIMITATIONS
![]() |
ACKNOWLEDGMENTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Barkham, M., Margison, F., Leach, C., et al (2001) Service profiling and outcomes benchmarking using the CORE-OM: toward practice-based evidence in the psychological therapies. Journal of Consulting & Clinical Psychology, 69, 184 -196.[CrossRef][Medline]
Cape, J. & Parham, A. (2001) Rated casemix of general practitioner referrals to practice counsellors and clinical psychologists: a retrospective survey of a years caseload. British Journal of Medical Psychology, 74, 237 -246.[CrossRef]
Cohen, J. (1988) Statistical Power Analysis for the Behavioural Sciences (2nd edn). New Jersey: Lawrence Erlbaum.
Cronbach, L. J. (1951) Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297 -334.
Department of Health (2001) The Mental Health Policy Implementation Guide. London: Department of Health.
Elliott, R., Stiles, W. B. & Shapiro, D. A. (1993) Are some psychotherapies more equivalent than others? In Handbook of Effective Psychotherapy (ed. T. R. Giles), pp. 455-479. New York: Plenum Press.
Evans, C., Connell, J., Barkham, M., et al
(2002) Towards a standardised brief outcome measure:
psychometric properties and utility of the CORE-OM. British Journal
of Psychiatry, 180, 51
-60.
Gardner, M. J. & Altman, D. G. (1986) Confidence intervals rather than P values: estimation rather than hypothesis testing. BMJ, 292, 746 -750.[Medline]
Gunnell, D. & Harbord, R. (2003) Suicidal thoughts. In Better or Worse: A Longitudinal Study of the Mental Health of Adults, pp. 45-65, London: TSO.
Jacobson, N. & Truax, P. (1991) Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12 -19.[CrossRef][Medline]
Mellor-Clark, J. & Barkham, M. (2000) Quality evaluation: methods, measures and meaning. In Handbook of Counselling and Psychotherapy (eds C. Feltham & J. Horton), pp. 225-270. London: Sage Publications.
Mellor-Clark, J., Barkham, M., Connell, J., et al (1999) Practice-based evidence and standardized evaluation: informing the design of the CORE system. European Journal of Psychotherapy, Counselling and Health, 2, 357-374.
Slade, M., Cahill, S., Kelsey, W., et al (2001) Threshold 3: the feasibility of the Threshold Assessment Grid (TAG) for routine assessment of the severity of mental health problems. Social Psychiatry & Psychiatric Epidemiology, 36, 516 -521.[CrossRef][Medline]
Whewell, P. & Bonanno, D. (2000) The Care
Programme Approach and risk assessment of borderline personality disorder:
clinical validation of the CORE risk sub-scale. Psychiatric
Bulletin, 24, 381
-384.
Wilson, E. B. (1927) Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, 209 -212.
Wing, J, K., Beevor, A., Curtis, R. H., et al (1998) Health of the Nation Outcome Scales (HoNOS): research and development. British Journal of Psychiatry, 172, 11-18.[Abstract]
Received for publication February 18, 2004. Revision received September 9, 2004. Accepted for publication September 10, 2004.