Tavistock Marital Studies Institute, London
Tavistock and Portman NHS Trust and Rampton Hospital, London
Correspondence: M. Lanman, Tavistock Marital Studies Institute, Tavistock Centre, 120 Belsize Lane, London NW3 5BA, UK. Tel +44 (0)20 7447 3724; e-mail: monica{at}tmsi.org.uk
Declaration of interest The study was funded by the Lord Chancellor's Department.
See editorial, pp.
193195, this issue.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Aims To test the interrater reliability and construct validity of a joint PRP score for couples.
Method Seven therapists independently rated couples' interactions using the 30-item PRP and segments of videotaped interviews with 19 couples.
Results Interrater reliability was good and correlations between items clearly supported the underlying Kleinian bipolar model used (paranoidschizoid/ depressive positions).
Conclusions Psychoanalytic couple psychotherapists agree in independent judgements of the nature of couple functioning, these judgements being based on envisaging couples in terms of an unconsciously shared state of mind.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Background
It is increasingly recognised that couple relationships make an important
contribution to patients' responses to a very wide range of physical and
emotional problems, and that couple-focused interventions are helpful in many
of these situations (Leff et al,
2000). For many couple and family therapies, a fundamental axiom
is that the intervention is directed at, and works through, the couple or
family system, and not the individuals. There is some evidence that the nature
of the change sought by therapy is important in predicting the durability of
that change (Snyder et al,
1991). Hence it becomes important to be able to measure different
kinds of change sought by different therapies in order to test for a link
between type of therapy and durability of change. Such information is
important also in service development and training. Researching analytically
informed couple therapies requires measures of the couple relationship that
detect unconscious as well as conscious changes; measures of symptomatic
improvement alone are inadequate for this purpose. However, this area tends to
be neglected because recognition and evaluation of psychological functioning
at an unconscious level involves assessing a complex matrix of behaviours and
feelings, using inference as well as overt evidence, whether assessing
individuals or couples (Milton,
1997). But, to borrow a quotation from Slade & Priebe's recent
editorial, the challenge is to make the important measurable, not the
measurable important (Robert McNamara, former US Secretary of State,
quoted in Slade & Priebe,
2001).
Although measures of individual psychological functioning abound, few are psychoanalytically based and the contribution of psychoanalytic thinking to mental health has been controversial for many reasons, including the difficulty in providing evidence of the objectivity, reliability and validity of its judgements. It might be compared with the state of diagnosis in psychiatry before the series of studies that pioneered assessment of reliability and validity, including the use of operation-alised ratings and videotaped interviews (e.g. Spitzer et al, 1967; Wing et al, 1974; Wing & Nixon, 1975). There has been some progress in this regard in psychoanalytic theory and therapies for individuals, but assessment of psychoanalytic couple therapy has lagged behind.
Particular problems arise for couple therapy in that there is no perfect formula for combining the individual scores for each partner, to yield a couple score. To capture and evaluate changes in a couple's patterns of relatedness, a measure is needed that looks at the couple as a unit, and we believe that there are currently no measures with established reliability and validity that assess the unconscious functioning of a couple. Hence it is necessary that such a measure be developed to complement measures of individuals, to provide an empirical test of the theoretical understanding on which psychoanalytic couple therapy is based.
Psychoanalytically informed couple therapy has a strong theoretical base and a strong body of anecdotal case reports and case series but traditionally it has not drawn on or developed nomothetic measures of the theory. Psychoanalytic couple therapists think of the patterns of interaction established between two individuals making up a couple as being rooted in shared or similar aspects of their individual psychological states of mind, either conscious or unconscious, such as expectations, anxieties and defences. These interact via unconscious processes of mutual projection (Ruszczynski, 1993). What this means is that the partners are understood to deal with certain rejected or feared aspects of themselves by assuming them to be located in the other, and act accordingly without necessarily being aware consciously of doing so. It is thought that partners tend to choose each other partly because there is some unconscious fit between them: the expectations and anxieties involved are, to some extent, similar for both and each has a way of coping with these that fits in with the projections from the other (Balint, 1993).
This approach is influenced by psychoanalytic ideas about mental functioning derived from the work of Klein (1935, 1946) and widely used in contemporary psychoanalysis (see Britton, 1998: pp. 29-40). In particular, it draws on the ideas referring to two constellations of psychological functioning known as paranoidschizoid and depressive, and characteristic unconscious defensive structures are associated with each. Briefly, paranoidschizoid refers to a state of mind in which uncomfortable feelings tend to be denied in the self and experienced as located somewhere else, making the environment or the other person seem threatening; depressive refers to a state of mind in which the self feels guilt and responsibility for damage to, or failings in, others or the environment. (It should be noted that there is some overlap between the psychoanalytic and psychiatric uses of the terms paranoid and schizoid, but the use of depressive in the two fields is different, psychiatric depression often having paranoidschizoid rather than depressive aspects in Kleinian analytical assessment.)
Thinking about couples as tending to share a pattern of relating at an unconscious level does not imply that the two individuals necessarily appear or feel similar, or are consciously aware of what they have in common. They may appear to be like chalk and cheese and yet one may find that in the course of therapy they exchange roles at times. How rigidly or flexibly different psychological functions are distributed between the partners is regarded by couple psychotherapists as a key factor in determining the contribution of the relationship to emotional and physical health.
A measure of this shared psychology could provide a joint couple score. If reliable, it would provide support for the way in which psychoanalytic couple psychotherapists understand relationships, and be of use in the evaluation of relationship therapies.
This study aimed to develop and test an instrument for this purpose. The question was: could independent raters agree on their assessment if they were asked to rate the couple as a single unit, thinking in terms of a single state of mind or mode of psychological functioning being shared by both partners.
Hypotheses
Our hypotheses were as follows.
![]() |
METHOD |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The raters were clinicians who were either psychoanalytic couple psychotherapists trained at the Tavistock Marital Studies Institute in London (n=6) or in training as such (n=1), and they were asked to use their clinical judgement in rating the states of mind and patterns of relating of the series of couples on the basis of the first 30 min of the couples' consultations with similarly qualified therapists. The authors were trained by Hobson and Patrick in the use of the PRP, and the raters had two and a half hours of guided practice in its use, which involved discussion and the rating of two brief extracts from videotapes of couple consultations not then used in the study.
Extracts from 19 videotaped consultations were rated. Several different therapists conducted the consultations. Out of a total of 26 available tapes, four were discarded because the sound quality was too poor and a further three on the grounds that the consultation got going so slowly that there was not enough material to rate within the first half-hour. These assessments had been conducted according to routine clinical practice in a specialist couple psychotherapy unit.
The first objective of the study was to assess interrater agreement. This
was done as in the original study by the use of Kendall's coefficient of
concordance W (Siegel,
1956: pp. 229-238), as calculated by SPSS version 10.07
(Norusis, 1992). Kendall's
coefficient W lies in the range 0-1, where 1 indicates perfect
agreement among the raters on the rank order of the videotapes. The type I
error criterion was set at 0.05. Formal statistical power was not
calculated in advance (modelling power for multiple raters is complex) but the
decision was made to use slightly more raters and videotapes than in Hobson
et al (1998), to
ensure that at least as much statistical power was available. Reliability was
compared with that reported for rating individuals and differences were tested
by assessing how many of the 30 items were rated more reliably in one study
than the other, applying Wilcoxon's non-parametric test of ranked
differences.
We also report overall reliability, to bring the assessment more in line
with diagnostic and other ratings that are made on the summation of ratings
from multiple separate indicators or items. The parameter used to provide a
direct comparison with many reliability studies of multi-item measures or
interrater studies is Cronbach's coefficient (equivalent to the
mixed-effect, consistency, intraclass correlation coefficient (ICC; see
Bravo & Potvin, 1991; MacLennan, 1993).
The second objective of the study was to assess whether the ratings of the couples suggested that the paranoidschizoid and depressive positions were inversely related, such that if a couple was likely to be rated higher on the 15 PRP depressive items it would be more likely to be rated lower on the paranoidschizoid items. As previously, this was assessed with two separate tests on the mean ratings on each item across the seven raters.
The first test involved the formulation of two composites, allocating the first seven and the last eight items for each of the paranoidschizoid and depressive types exactly as in Hobson et al (1998). These were subjected to maximum likelihood exploratory factor analysis. If a very large proportion of the variance across those four composite ratings is the first factor, this indicates that the paranoidschizoid and depressive items are opposed. Formal tests comparing the proportion of variance in the first factor in the two studies are not readily available. However, a markedly lower proportion of variance in the first factor in this study would raise questions about the relative construct validity of the PRP when used to rate individuals and couples.
The second, more-fundamental test of the paranoidschizoid/depressive dimensionality is to look at the exploratory principal component analysis of all 30 mean ratings (after reversing the paranoidschizoid items). Items showing negative loadings on the first component would be failing to fit into this paranoidschizoid/depressive dimensional model. As previously, items that showed loadings below 0.3 on the first component were censored as being unlikely to represent reliable variance on that dimension. The binomial distribution was used to test the likelihood that the items would have loaded as strongly as they have by chance alone.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The Kendall concordance coefficients for this study (W) for each of the 30 items are shown in Table 1. All were statistically significant at P < 0.05 (the lowest was for item 30: W=0.24, P=0.04). Concordance was moderately higher than in Hobson et al (1998: mean=0.44 v. 0.37, median=0.44 v. 0.34; binomial test, P=0.006; Wilcoxon P=0.014). Intraclass correlation coefficients, which are based on scores, not ranks, are shown for comparison with other reliability studies and are generally very acceptable for single-item reliability on seven raters.
|
Inspection revealed that items 16, 24, 25, 26, 27, 29 and 30 showed lower values than in Hobson et al (1998). The finding that the majority of these items were general affect items suggested that a post hoc analysis might throw more light on this because the 30 items of the PRP fall into three groups of three. The mean reliability for the first ten items in this study was 0.50 (cf. 0.33 in Hobson et al, 1998); for the second group of items the comparison was 0.41 v. 0.34; and for the last group of ten the comparison was 0.42 v. 0.43.
The dimensionality check showed that the first-factor eigenvalue of 3.47 accounted for 87% of the variance. This is higher than the values of 3.24 and 76%, respectively, found by Hobson et al (1998), indicating an even larger first dimension of variation across the couples rated.
Finally, the test of whether or not the 30 items displayed a bipolar structure in which the paranoidschizoid items correlated negatively with the depressive items showed all but one item (item 25) loading above 0.3, in contrast to the finding of six low-loading items in Hobson et al (1998). The low-loading item had a negative loading of -0.25; hence 29 of the 30 items showed loading in the predicted direction. The probability of this happening by chance alone is vanishingly small (P=9 x 10-10).
In light of the strong support for the first major dimension of variation,
ICCs (equivalent to Cronbach's ) for the 30 items were calculated. The
overall
for all 210 ratings (7 raters, 30 items) was 0.98 and for each
rater it was 0.87, 0.94, 0.96, 0.90, 0.74, 0.96 and 0.87. The overall
interrater reliability on the summing of the 30 items was 0.92.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Construct validity
The second finding was that the data showed a clear first dimension of
variation on which the paranoidschizoid items correlated negatively
with the depressive position items, with only one item not fitting the
predicted pattern. This suggests that the Kleinian contrast of
paranoidschizoid and depressive may have strong construct validity as
rated by the PRP. If the items showed no such empirical construct validity,
the finding that 29 of the 30 loaded as expected would happen in about one in
a billion such experiments.
What is being tested
It is important to be clear what is, and what is not, tested by the study.
The question of whether raters are rating a shared state of mind
in the couples is not addressed directly by any one parameter in the analyses.
Equally, whether a paranoidschizoid or depressive unconscious state of
mind is shared by the members of each couple is not tested directly either.
What is tested is whether there are some shared qualities within each couple
that can be rated by the majority of the raters on the majority of videotape
extracts for the majority of the items (98% overall). The couples are seen to
differ on these items and, if there were not some recognisable shared
qualities of the couples, neither the interrater reliability nor the strong
validation of the single dimension of paranoidschizoid/depressive would
have been seen. The differences that were found reliably and that associated
items as predicted with a bipolar dimension of difference between the couples
do not prove the existence of Kleinian positions. Similarly, the reliable
association of Schneiderian symptoms with each other, and separate from the
major symptoms of anxiety, does not prove the existence of schizophrenia or
anxiety as useful diagnostic categories. However, finding either unreliable
ratings or no association of ratings as predicted by analytical theory would
have supported rejection of either the PRP as a measure or the analytical
theory of couple therapy, or both. The finding of reliability and dimensional
opposition for the couple data supports the idea that trained raters can infer
a couple mind. Their ratings are congruent with theory. In this
way these findings support construct validity.
The circularity question
A further question is whether the ratings followed from training of the
raters in such a way as to make the correlations between the ratings follow
from theory to rating rather than allowing rating to test theory. There are
well-recognised ways in which spurious construct validity can be shown. For
example, this can be seen in relation to historical stereotypes of the
epileptic personality. A formal rating study of the concept some
decades ago might have shown apparent construct validity with all the items
loading as predicted, but if epilepsy were not observable so that the other
characteristic could not be rated by the halo effect there would have been no
interrater reliability. Similar, prevalent American and Russian definitions of
schizophrenia before the 1980s might have shown apparent reliability, and
validity might have been shown, because the descriptive process was circular
(Wing & Nixon, 1975).
However, the 30 items used here covered three distinct domains and none of them individually is specific to the framework from which psychoanalytic formulations are derived. The raters were not instructed to formulate the couples they saw in the Kleinian positional spectrum; rather, they were asked to make the ratings without formulation. Hence, vulnerability to the charge of spurious construct validity appears to be minimised here. All construct validation must be a process of survival of a long series of empirical tests, not merely of one, and none is definitive. The final test that changes the theory is rare in the human and social sciences.
![]() |
Clinical Implications and Limitations |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
LIMITATIONS
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Couple code number:
General instructions
Please circle a score for the couple against each question. Please note
carefully what the scale represents, i.e. 1 is Very
uncharacteristic, and 5 is Very characteristic. We want you to
consider the couple as a unit, as if they were one person. Or, to put it
another way, rate them in terms of their shared state of mind, thinking in
terms of splitting and mutual projection. We want you to make a clinical
judgement in answering the questions, that is, a judgement using all the
data available to you as a clinician, including what the couple say and do,
which could be overtly mostly about some third party such as a child, and
also including your countertransference response and a degree of clinical
inference about the internal object relations you are observing. But this
would not include a deeply unconscious structure that the rater could
only infer theoretically. You need to have some clinical evidence for your
judgement.
Section 1: Personal relatedness
On this first part of the scale, we would like you to consider the quality
of what the couple experiences to happen between them, or between either of
them and other people, or between other people (as reported), and to make
judgements on the extent to which each of the following characterise the
couple's overall functioning (considered as two sides of a whole). The quality
of relatedness between couple and interviewer also should be considered in
making a judgement.
Characteristic relatedness patterns
involve:
|
Section 2: Characteristics of people (objects)
In this second part, we would like you to consider the nature of the people
that the couple feel they encounter (possibly reflecting internal objects).
The characteristics may be inferred from behaviour during the interview, and
from the couple's own descriptions. The picture may contain apparent
contradictions (i.e. objects of very differing natures, e.g. very good
and very bad figures). Ratings may also apply to a couple's
experience of themselves as well as of others. Once more, we would like you to
judge the extent to which the following characterise the couple's overall
experiences of people.
The figures are experienced
as:
|
Section 3: Predominant affective states
Please rate the degree to which the following characterise or underlie the
couple's conscious predominant affective state. We would encourage you again
to use your intuitive and clinical skills in judging what the material
expresses about overall functioning, in addition to basing ratings on explicit
evidence. But you should have some evidence (which can be your
countertransference, or a clear sense that what you are seeing is a defence
against something) for your judgement, other than theoretical
assumption.
|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Balint, E. (1993) Unconscious communications between husband and wife. In Before I was I: Psychoanalysis and the Imagination (eds J. Mitchell & M. Parsons), pp. 207-218. London: Free Association Books.
Beck, A. T., Ward, C. H., Mendelson, M., et al (1961) An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571.[Medline]
Bravo, G. & Potvin, L. (1991) Estimating the reliability of continuous measures with Cronbach's alpha or the intraclass correlation coefficient: toward the integration of two traditions. Journal of Clinical Epidemiology, 44, 381-390.[Medline]
Britton, R. (1998) Belief and Imagination. London: Routledge.
Hamilton, M. (1967) Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology, 6, 278-296.
Hobson, P., Patrick, M. & Valentine, J. (1998) Objectivity in psychoanalytic judgements. British Journal of Psychiatry, 173, 172-177.[Abstract]
Klein, M. (1935) A contribution to the psychogenesis of manic depressive states. International Journal of Psycho-Analysis, 16, 282-310.
Klein, M. (1946) Notes on some schizoid mechanisms. International Journal of Psycho-Analysis, 27, 99-110.
Leff, J., Vearnals, S., Wolff, G., et al
(2000) The London Depression Intervention Trial. Randomised
controlled trial of antidepressants v. couple therapy in the
treatment and maintenance of people with depression living with a partner:
clinical outcome and costs. British Journal of
Psychiatry, 177,
95-100.
MacLennan, R. N. (1993) Interrater reliability with SPSS for Windows 5.0. American Statistician, 47, 292-296.
Milton, J. (1997) Why assess? Psychoanalytic assessment in the NHS. Psychoanalytic Psychotherapy, 11, 47-58.
Norusis, M. J. (1992) SPSS for Windows. Advanced Statistics. Chicago, IL: SPSS Inc.
Ruszczynski, S. (1993) Psychotherapy with Couples. London: Karnac Books.
Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences. Tokyo: McGraw-Hill International.
Slade, M. & Priebe, S. (2001) Are
randomised controlled trials the only gold that glitters? British
Journal of Psychiatry, 179,
286-287.
Snyder, K. K., Wills, R. M. & Grady-Fletcher, A. (1991) Long-term effectiveness of behavioural versus insight-orientated marital therapy: a four-year follow-up study. Journal of Counselling and Clinical Psychology, 59, 138-141.[Medline]
Spitzer, R. L., Cohen, J., Fliess, J. L., et al (1967) Quantification of agreement in psychiatric diagnosis, a new approach. Archives of General Psychiatry, 17, 83-87.[CrossRef][Medline]
Wing, J. & Nixon, J. (1975) Discriminating symptoms in schizophrenia a report from the International Pilot study of Schizophrenia. Archives of General Psychiatry, 32, 853-859.[Abstract]
Wing, Cooper, J. E. & Sartorius, N. (1974) Measurement and Classification of Psychiatric Symptoms. An Instruction Manual for the PSE and Catego Program. Cambridge: Cambridge University Press.
Received for publication December 6, 2001. Revision received August 7, 2002. Accepted for publication August 7, 2002.
Related articles in BJP: