Suitability and utility of the CORE-OM and CORE-A for assessing severity of presenting problems in psychological therapy services based in primary and secondary care settings

Michael Barkham, PhD, Naomi Gilbert, BSc, Janice Connell, BSc, Chris Marshall, BSc and Elspeth Twigg, BSc

Psychological Therapies Research Centre, University of Leeds, Leeds, UK

Correspondence: Professor Michael Barkham, Psychological Therapies Research Centre, 17 Blenheim Terrace, University of Leeds, Leeds LS2 9JT, UK. E-mail: m.barkham{at}leeds.ac.uk

Declaration of interest M.B. received funding from the Mental Health Foundation and the Artemis Trust to support the development of the CORE-OM and CORE-A, respectively.


   ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD
 RESULTS
 DISCUSSION
 Clinical Implications and...
 ACKNOWLEDGMENTS
 REFERENCES
 
Background There is a need for reliable assessment tools that are suitable for the counselling and the psychological therapy services in primary and secondary care settings.

Aims To test the suitability and utility of the Clinical Outcomes in Routine Evaluation - Outcome Measure (CORE-OM) and CORE-Assessment (CORE-A) assessment tools.

Method Service intake data were analysed from counselling and psychological therapy services in 32 primary care settings and 17 secondary care settings.

Results Completion rates exceeded 98% in both of the settings sampled. Intake severity levels were similar but secondary care patients were more likely to score above the risk cut-off and the severe threshold and to have experienced their problems for a greater duration.

Conclusions The CORE-OM and CORE-A are suitable assessment tools that show small but logical differences between psychological therapy services in primary- and secondary-based care.


   INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD
 RESULTS
 DISCUSSION
 Clinical Implications and...
 ACKNOWLEDGMENTS
 REFERENCES
 
There is increasing pressure on mental health services to adopt assessment and outcome measures that can be used routinely in mental health settings (Department of Health, 2001). Measures need to be appropriate for specific patient populations but also be capable of ‘following the patient’ through the various tiers of mental health services. The Clinical Outcomes in Routine Evaluation - Outcome Measure (CORE-OM; Barkham et al, 1998, 2001; Evans et al, 2002) has become a widely used patient self-report measure across service settings delivering psychological treatments, together with a practitioner-completed component termed the CORE-Assessment (CORE-A; Mellor-Clark et al, 1999; Mellor-Clark & Barkham, 2000). However, there has been no test to compare the CORE-OM and CORE-A in assessing the severity of presenting problems in bona fide primary versus secondary care settings. Accordingly, first we investigate whether the CORE-OM and CORE-A are appropriate as assessment tools in both service settings, and then we identify whether they reflect differences between the two settings.


   METHOD
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD
 RESULTS
 DISCUSSION
 Clinical Implications and...
 ACKNOWLEDGMENTS
 REFERENCES
 
The data
This study reports on data collected by 49 National Health Service (NHS) sites routinely using the CORE-OM to monitor patients at intake to their services. The data were anonymised and aggregated and are independent of data set out in a previous study reporting psychometric properties of the CORE-OM (Evans et al, 2002). In total, 32 sites were primary care based, providing counselling or psychology services within primary care groups or trusts. The remaining 17 sites were secondary care based and provided clinical psychology and psychotherapy services. The majority of referrals were from general practitioners, accounting for 93.3% of referrals to primary care sites and 64.5% to secondary care sites. Data (CORE-OM and/or CORE-A) were completed for 6610 primary care patients and 2311 secondary care patients in total.

Patient samples
Patients not completing the CORE-OM or missing more than three items from the 34-item measure were excluded from the mean score calculations. Using these criteria, 5733 primary care patients and 1918 secondary care patients were selected for inclusion. Table 1 presents demographic information for the two patient samples.


View this table:
[in this window]
[in a new window]
 
Table 1 Patient sample demographics
 

Measures
Patient-completed measure: CORE-OM
The CORE-OM comprises 34 items addressing domains of subjective well-being (4 items), symptoms (12 items), functioning (12 items) and risk (6 items; 4 ‘risk to self’ items and 2 ‘risk to others’ items). Within the symptom domain ‘item clusters’ address anxiety (4 items), depression (4 items), physical problems (2 items) and trauma (2 items). The functioning domain item clusters address general functioning (4 items), close relationships (4 items) and social relationships (4 items).

Items are scored on a five-point scale from 0 (‘not at all’) to 4 (‘all the time’). Half of the items focus on low-intensity problems (e.g. ‘I feel anxious/nervous’) and half focus on high-intensity problems (e.g. ‘I feel panic/terror’). Eight items are keyed positively.

All services in the study asked patients to complete the CORE-OM as a measure of distress at intake (i.e. before any intervention). In practice, the CORE-OM was completed during screening or assessment by 73.8% of primary care patients and 87.3% of secondary care patients, and completed at the first therapy session by the remaining 26.2% in primary care and 12.7% in secondary care.

Practitioner-completed measure: CORE-A
The CORE-A enables the collection of referral information, demographics, assessment, outcome, and data on presenting problem severity and duration. The CORE-A lists the following 14 problems: depression, anxiety, psychosis, personality problems, cognitive/learning difficulties, eating disorder, physical problems, addictions, trauma/abuse, bereavement, self-esteem, interpersonal problems, living/welfare and work/academic. At initial assessment, practitioners recorded the presence or absence of these problems and rated the severity of each presenting problem on a scale from 1’ (‘minimal’) to 4 (‘severe’). The duration of problems was recorded under four categories: <6 months, 6-12 months, >12 months or recurring/continuous.

Data analysis
All data were scanned optically using FOR-MIC software (Formic Design and Automatic Data Capture, 1996). Statistical analyses were carried out using the Statistical Package for the Social Sciences for Windows (version 11). The CORE-OM overall mean scores and non-risk scores were calculated using ‘pro-rating’, where up to three items were missed (i.e. if two items were not completed, the total score would be divided by 32 rather than 34). Domain mean scores were not ‘pro-rated’ if more than one item was missing from that domain.

Completion rates (n clients completing the CORE-OM) and missing items were analysed using the full data-set (n=6610 primary care and n=2311 secondary care). All subsequent analyses were carried out on the samples of patients completing the CORE-OM and fulfilling the criteria for pro-rating (n=5733 primary care and n=1918 secondary care).

Internal consistency of the CORE-OM was calculated using Cronbach’s coefficient {alpha} (Cronbach, 1951). Statistical power was high due to the large sample sizes, therefore differences in mean scores between samples are reported using confidence intervals (Gardner & Altman, 1986) and effect sizes (Cohen, 1988) rather than significance tests. An ‘effect size’ represents a standard deviation unit and is calculated as the difference between means divided by the pooled standard deviation. The standard guide to the effect size differences denotes three bands: 0.2 (small) 0.5 (medium) and 0.8 (large). On the basis of Cohen (1988), noting that a 0.2 effect size involved an 85% overlap between distributions, it has been suggested that an effect size of 0.4 (involving a 73% overlap) be used as the criterion for clinically meaningful differences (Elliott et al, 1993). Chisquared analysis was used to test proportional differences between samples (e.g. demographic characteristics).

To facilitate comparisons regarding the range of severity, we applied two cut-offs to the CORE-OM data that reflected differing levels of severity (for details, see Jacobson & Truax, 1991). The first cut-off on the CORE-OM, termed ‘clinical’, was defined as a score of 1.19 for men and 1.29 for women and was derived from calculating the CORE-OM score that would best demarcate membership of the general population (i.e. a lower score) or a clinical population (i.e. a higher score) using the following formula (see Evans et al, 2002):

The second cut-off, termed ‘severe’, was a CORE-OM score of 2.50 (both men and women) that approximated to a score of 1 s.d. above the mean for a clinical population and differentiated a mild/moderate clinical population from a severe clinical population (see Barkham et al, 2001). Odds ratio analysis was applied to estimate the caseness rate ratio using clinical cut-off points for the CORE-OM. Effect sizes and confidence intervals for proportions (Wilson, 1927) were calculated using Microsoft Excel 2000.


   RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD
 RESULTS
 DISCUSSION
 Clinical Implications and...
 ACKNOWLEDGMENTS
 REFERENCES
 
Acceptability
In order to assess whether the CORE-OM was acceptable to clients in both primary and secondary care settings, we examined completion rates (i.e. number of clients completing the CORE-OM) and missed items at intake assessment. Of the total, 5833 (88.3%) primary care patients and 1940 (84.0%) secondary care patients completed the CORE-OM. The completion rate was significantly higher for the primary care sample ({chi}2=28.2, P<0.001, 95% CI difference 2.7-6.0%). However, the proportion of completed measures with fewer than three items missing (i.e. within the criteria for pro-rating) was similar in both settings: 5733 (98.3%) in primary care and 1918 (98.9%) in secondary care ({chi}2=3.2, P=0.08, 95% CI – 1.0 to 0.1%).

The most commonly missed item in both primary and secondary settings was no. 19 (‘I have felt warmth and affection for someone’). The overall item omission rates were 0.9% (95% CI 0.7-1.2%) for primary care and 0.8% (95% CI 0.5-1.3%) for secondary care. In the primary care sample, five items had missing cases above the upper threshold (1.2%) of the 95% confidence interval. In the secondary care sample, two items had missing cases above the threshold (1.3%). Table 2 summarises the items above the threshold in each sample.


View this table:
[in this window]
[in a new window]
 
Table 2 The CORE–Outcome Measure items above the 95% CI omission threshold1
 

Internal consistency
We used Cronbach’s coefficient {alpha} to calculate the internal reliability of the CORE-OM domains and item clusters within domains for both primary and secondary care settings. Although the item clusters were originally selected to represent the range of patient experience and not intended to be used as sub-scales, we calculated {alpha} values for them in order to test the robustness of the components within each domain. The {alpha} value indicates the proportion of covariance between items. Table 3 illustrates that all domains showed good internal reliability, with {alpha} >0.70 and <0.97 in each setting. In both primary and secondary care, the well-being domain had the lowest internal consistency. Values of {alpha} exceeded 0.70 for six of the nine item clusters - anxiety, depression, trauma, general functioning, social relationships, and risk to self - whereas for close relationships {alpha} was 0.65-0.70. Only for physical problems and risk to others (both of which comprised just two items) was {alpha}<0.60.


View this table:
[in this window]
[in a new window]
 
Table 3 Internal consistency of CORE–Outcome Measure by service setting (Cronbach’s {alpha})
 

The CORE-OM profile of severity of problems
Overall scores
To compare the overall CORE-OM scores in primary and secondary care settings, we generated notched boxplots and histograms presenting the distribution of CORE-OM mean scores for all items (see Figs 1 and 2). In terms of overall mean scores, the two settings showed a strikingly similar distribution. Figure 1 shows that there were four outliers in the primary care sample scoring above the maximum secondary care score of 3.65 and no patient in either setting scored 4. As illustrated in Fig. 2, the distributions are near symmetrical although different in total frequency as a result of the different n in each sample.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 1 The box encloses the interquartile range (i.e. the middle 50% of scores). The notch is centred around the sample median and the shading around the notch shows the 95% confidence interval. The whiskers extend to the minimum score below the box, and for the secondary care sample extend to the maximum score above the box. The primary care sample has four outliers (1.5-3 times the interquartile range above the 75 centile) shown above the whisker.

 


View larger version (42K):
[in this window]
[in a new window]
 
Fig. 2 The distributions for primary care and secondary care samples appear to the left and right, respectively, of the central y-axis.

 

Domain scores
We calculated mean scores for each domain to determine whether patients in primary and secondary care settings showed a different profile of scores. Table 4 presents CORE-OM scores by domain for the two service settings, together with effect sizes indicating the degree of difference between populations. Although all effect size differences were ‘small’ (i.e. appreciably below 0.20), secondary patients did report higher levels of risk (effect size -0.15). The well-being domain showed the opposite trend, with primary care patients reporting poorer subjective well-being than secondary care patients (effect size 0.08).


View this table:
[in this window]
[in a new window]
 
Table 4 The CORE–Outcome Measure domain scores by service setting
 

Item scores
We analysed the mean scores for each of the 34 CORE-OM items across the two service settings to establish whether any items appeared to function differently in these patient groups. Comparison of the mean item scores using Cohen’s effect size methodology indicated that secondary care patients scored higher than primary care patients on all four ‘risk to self’ items: item 9 ‘I have thought of hurting myself’ (effect size -0.14), item 16 ‘I have made plans to end my life’ (effect size -0.12), item 24 ‘I have thought it would be better if I were dead’ (effect size -0.14) and item 34 ‘I have hurt myself physically or taken dangerous risks with my health’ (effect size -0.15). There was no difference between primary and secondary care patients on the two ‘risk to others’ items: item 6 ‘I have been physically violent to others’ (effect size 0.00) and item 22 ‘I have threatened or intimidated another person’ (effect size -0.03). Primary care patients scored higher than secondary patients on item 14 ‘I have felt like crying’ (effect size 0.22) and item 18 ‘I have had difficulty getting to sleep or staying asleep’ (effect size 0.13).

Application of clinical cut offs
We applied the two cut-off thresholds to the data and Table 5 presents the proportion of patients in each setting above or equal to the cut-off thresholds. Chi-squared tests showed that a significantly higher proportion of primary care patients than secondary care patients were above the clinical cut-off for the well-being domain and non-risk items (P<0.01). However, as noted in the methodology, the statistical power of the data-set was high due to the large n, increasing the likelihood of statistical significance for small differences. Odds ratio (OR) analysis showed that secondary care patients were only marginally less likely to be above these cut-offs (OR=0.84 for well-being; OR=0.85 for non-risk items). Secondary care patients were more likely than primary care patients to be above the risk cut-off (OR=1.23, CI 1.10-1.36) and more likely to be above the ‘severe’ threshold (OR=1.34, CI 1.17-1.53).


View this table:
[in this window]
[in a new window]
 
Table 5 Proportion of patients above or equal to clinical cut-off thresholds
 

Patient-rated CORE-OM severity and presenting problems
We used the practitioner rating provided on the CORE-A form to determine patients' presenting problems. We classified each problem as present if given any rating by the practitioner from 1 (‘minimal’) to 4 (‘severe’) and absent if no rating was given. Table 6 presents the mean CORE-OM scores for patients grouped by presenting problem. Groups were not independent because many patients were rated as presenting with more than one problem.


View this table:
[in this window]
[in a new window]
 
Table 6 The CORE–Outcome Measure risk and non-risk scores by presenting problem
 

The effect size analysis in Table 6 shows that CORE-OM risk scores were a key factor in differentiating secondary care patients from primary care patients across the presenting problems. Secondary care patients had higher risk scores than primary care patients for all presenting problems, except addictions where both primary and secondary patients had relatively high mean risk scores. For patients with psychosis, personality problems and eating disorders (problems traditionally seen in specialist services), risk scores were substantially higher in secondary than in primary care (effect size >0.3). In addition, patients with psychosis, eating disorders and living/welfare problems also showed higher non-risk scores in secondary care than in primary care (i.e. higher levels of overall distress; effect size >0.1).

Practitioner-rated CORE-A profile of severity of presenting problems
We used the CORE-A data to compare the practitioner-rated severity and duration of problems experienced in primary and secondary care settings. Table 7 presents the mean practitioner rating of the severity of the presenting problems. Effect size analysis in Table 7 shows that practitioners rated the severity of anxiety and bereavement higher in primary care than in secondary care settings (effect size >0.2), but the severity of personality problems, cognitive difficulties, eating disorder and physical problems was rated as higher in secondary care than in primary care settings (effect size >0.2). We were mindful that such differences could reflect differential anchor points in terms of perceptions of problems between practitioners from primary and secondary care settings. Accordingly, we sampled two ranges of CORE-OM scores - a lower range (CORE-OM range 1.00-1.60) and a higher range (CORE-OM range 2.20-2.80) - to check that there were no meaningful differences between primary and secondary care practitioners' ratings within these ranges. The mean effect size (low and high range) between primary and secondary care practitioners' ratings for each presenting problem fell below the 0.4 effect size criterion. Table 8 presents the mean rating of duration of the presenting problems in primary and secondary care settings. For all the presenting problems, secondary care patients were rated as having experienced the problem for a greater duration than primary care patients (all effect sizes >-0.2). The greatest difference in problem duration was for psychosis (effect size -0.7).


View this table:
[in this window]
[in a new window]
 
Table 7 Practitioner-rated CORE–Assessment profile of severity1 of presenting problems
 

View this table:
[in this window]
[in a new window]
 
Table 8 Duration1 of presenting problems
 


   DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD
 RESULTS
 DISCUSSION
 Clinical Implications and...
 ACKNOWLEDGMENTS
 REFERENCES
 
The purpose of this article was to investigate the suitability and utility of the CORE-OM and CORE-A for assessing the severity of the presenting problems in primary and secondary care-based psychological therapy services.

Suitability
In relation to the appropriateness of these tools in different service settings, the findings show that CORE-OM is acceptable to clients in both settings (as evidenced by high completion rates) and is robust in its structure across different settings (as evidenced by high internal reliabilities), even to the extent of most of the item clusters. However, it is acknowledged that this evidence pertains to counselling and psychological therapy services and could differ in other service settings. In addition, a minority of patients completed their measures at their first session rather than at screening or intake assessment. However, the realities of routine practice settings probably demand reasonable flexibility in the pursuit of maximising compliance in completing the assessment measures.

In administering the same measure in both primary and secondary settings, it might be presumed that the CORE-OM would generate a ceiling effect in secondary care services. We found no evidence of this in the data that we examined. However, we distinguish clearly between patients seen in out-patient settings within secondary care services (as reported here) and patients deemed to be within a category that has been referred to as ‘serious and enduring mental illness’. For such patients, the process of understanding and completing a self-report measure might yield results that are not necessarily continuous with those reported here (e.g. they might underscore rather than produce logically higher scores). However, Whewell & Bonanno (2000) reported that the risk sub-scale was ‘clinically valid’ in the CORE-A and CORE-OM scores matched for patients with borderline personality disorder. Where CORE-OM scores might not be considered safe, the CORE-A form completed by the practitioner would be the sole source of information.

Utility
Although we found general heterogeneity between primary and secondary care settings in self-rating on the CORE-OM, there was clear evidence that the CORE-OM discriminated between patients in secondary and primary care by showing them to be more likely to score higher on risk and be above the severe threshold. These two components support the ability of the CORE-OM to discriminate appropriately between service settings, a finding supported by the practitioners' consistent reporting of greater duration of patients' presenting problems in secondary care. These findings may provide an additional tool in the recognition by healthcare professionals of those patients potentially at risk of suicide (e.g. Gunnell & Harbord, 2003).

Our data showed primary care patients to be characterised by more acute problems (i.e. problems that received a lower duration rating). The self-severity rating may be related to the acute nature of the problems. Item analysis showed this with higher ratings on the item ‘felt like crying’, which is likely to reflect the immediacy of the problems experienced. In contrast, secondary care patients were characterised by more chronic problems (i.e. of higher duration) and higher risk scores on the CORE-OM. This agrees with the therapist-rated chronicity of problems in practice settings of counselling and clinical psychology (Cape & Parham, 2001). This profile of patients in secondary services appears to be a logical consequence of referral procedures and waiting times. However, we are mindful that practitioners in primary and secondary care settings may have differential anchor points in their evaluation of the severity of the presenting problems. When we controlled for patient-rated severity, we still found at least a 75% overlap in the distributions of primary- and secondary-based practitioners' ratings. Notwithstanding this overlap, our view on this is that practitioner ratings will be influenced by a myriad of professional and contextual factors that will require further research to ensure standard use in routine settings.

The use of both patient- and practitioner-completed assessment forms marks a step forward from reliance on either patient perception alone or established assessment packages using practitioner ratings alone (e.g. Health of the Nation Outcome Scales; Wing et al, 1998). The use of such data provides a logical base for benchmarking service delivery systems (e.g. Barkham et al, 2001) and adds to a developing literature (e.g. Slade et al, 2001) providing low-cost but reliable measures that can be adopted routinely in mental health settings.


   Clinical Implications and Limitations
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD
 RESULTS
 DISCUSSION
 Clinical Implications and...
 ACKNOWLEDGMENTS
 REFERENCES
 
CLINICAL IMPLICATIONS

LIMITATIONS


   ACKNOWLEDGMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD
 RESULTS
 DISCUSSION
 Clinical Implications and...
 ACKNOWLEDGMENTS
 REFERENCES
 
Authors affiliated to the Psychological Therapies Research Centre were funded by the Priorities and Needs R&D Levy via Leeds Community Mental Health Trust.


   REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD
 RESULTS
 DISCUSSION
 Clinical Implications and...
 ACKNOWLEDGMENTS
 REFERENCES
 
Barkham, M., Evans, C., Margison, F., et al (1998) The rationale for developing and implementing core outcome batteries for routine use in service settings and psychotherapy outcome research. Journal of Mental Health, 7, 35-47.[CrossRef]

Barkham, M., Margison, F., Leach, C., et al (2001) Service profiling and outcomes benchmarking using the CORE-OM: toward practice-based evidence in the psychological therapies. Journal of Consulting & Clinical Psychology, 69, 184 -196.[CrossRef][Medline]

Cape, J. & Parham, A. (2001) Rated casemix of general practitioner referrals to practice counsellors and clinical psychologists: a retrospective survey of a year’s caseload. British Journal of Medical Psychology, 74, 237 -246.[CrossRef]

Cohen, J. (1988) Statistical Power Analysis for the Behavioural Sciences (2nd edn). New Jersey: Lawrence Erlbaum.

Cronbach, L. J. (1951) Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297 -334.

Department of Health (2001) The Mental Health Policy Implementation Guide. London: Department of Health.

Elliott, R., Stiles, W. B. & Shapiro, D. A. (1993) Are some psychotherapies more equivalent than others? In Handbook of Effective Psychotherapy (ed. T. R. Giles), pp. 455-479. New York: Plenum Press.

Evans, C., Connell, J., Barkham, M., et al (2002) Towards a standardised brief outcome measure: psychometric properties and utility of the CORE-OM. British Journal of Psychiatry, 180, 51 -60.[Abstract/Free Full Text]

Gardner, M. J. & Altman, D. G. (1986) Confidence intervals rather than P values: estimation rather than hypothesis testing. BMJ, 292, 746 -750.[Medline]

Gunnell, D. & Harbord, R. (2003) Suicidal thoughts. In Better or Worse: A Longitudinal Study of the Mental Health of Adults, pp. 45-65, London: TSO.

Jacobson, N. & Truax, P. (1991) Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12 -19.[CrossRef][Medline]

Mellor-Clark, J. & Barkham, M. (2000) Quality evaluation: methods, measures and meaning. In Handbook of Counselling and Psychotherapy (eds C. Feltham & J. Horton), pp. 225-270. London: Sage Publications.

Mellor-Clark, J., Barkham, M., Connell, J., et al (1999) Practice-based evidence and standardized evaluation: informing the design of the CORE system. European Journal of Psychotherapy, Counselling and Health, 2, 357-374.

Slade, M., Cahill, S., Kelsey, W., et al (2001) Threshold 3: the feasibility of the Threshold Assessment Grid (TAG) for routine assessment of the severity of mental health problems. Social Psychiatry & Psychiatric Epidemiology, 36, 516 -521.[CrossRef][Medline]

Whewell, P. & Bonanno, D. (2000) The Care Programme Approach and risk assessment of borderline personality disorder: clinical validation of the CORE risk sub-scale. Psychiatric Bulletin, 24, 381 -384.[Abstract/Free Full Text]

Wilson, E. B. (1927) Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, 209 -212.

Wing, J, K., Beevor, A., Curtis, R. H., et al (1998) Health of the Nation Outcome Scales (HoNOS): research and development. British Journal of Psychiatry, 172, 11-18.[Abstract]

Received for publication February 18, 2004. Revision received September 9, 2004. Accepted for publication September 10, 2004.