Division of Epidemiology, Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
Department of Psychiatry, West Virginia University, Morgantown, West Virginia, USA
Department of Psychiatry, Cathay General Hospital, Taipei, Taiwan
Section of Social and Epidemiological Psychiatry, Department of Psychiatry, University of Leicester, Leicester, UK
Department of Psychiatry, University of Nottingham, UK
Department of Psychiatry, Lo-Tung Poh-Ai Hospital, Lo-Tung, I-Lan, Taiwan
Department of Psychiatry, Washington University, St Louis, Missouri, USA
Department of Psychiatry, Chang Gung University College of Medicine and Chang Gung Memorial Hospital, Tao-Yuan, Taiwan
Department of Neuropsychiatry, Kai-Suan Psychiatric Hospital, Kaohsiung, Taiwan
Division of Epidemiology, Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
Correspondence: Professor Andrew T. A. Cheng, Division of Epidemiology, Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan. Fax: 8862 2782 3047; e-mail: bmandrew{at}ccvax.sinica.edu.tw
Declaration of interest This work was supported by a grant from the Taiwan National Health Research Institutes (DD01-861X-MD-601S).
* Results from this study were presented at the XIII Congress of the
International Federation of Psychiatric Epidemiology, March 1999, Taipei,
Taiwan.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Aims To assess the cross-cultural clinical equivalence and reliability of a Chinese version of the World Health Organization Schedules for Clinical Assessment in Neuropsychiatry (SCAN).
Method UK-US and Taiwanese groups of psychiatrists used Chinese and English transcripts of videotape interviews of Taiwanese patients to discuss cross-cultural issues and ratings of SCAN items. Item ratings were compared quantitatively individually and pooled by SCAN section.
Results Chinese equivalents were found for all SCAN items. No between-group differences were found for most individual items, but there were differences for some scaled items. Average agreement between the two groups was 69-100%.
Conclusions Cross-cultural implementation based on SCAN in Taiwan appears valid.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The results of cross-cultural testing of an earlier version of SCAN at the level of diagnostic categories show satisfactory levels of agreement (Easton et al, 1997; Wing et al, 1998a), but at the important level of individual symptom items there are as yet no published reports of cross-cultural equivalence or interrater reliability.
The small-scale interrater reliability study reported here was based upon the use of interviews with Chinese patients that were conducted in Chinese by SCAN-trained Chinese psychiatrists. This allows a detailed examination of both translation and cross-cultural equivalence problems at the same time.
![]() |
METHOD |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Preparation of the new Chinese SCAN; translation and
back-translation
Translation and back-translation as recommended by Sartorius & Kukyen
(1993) were carried out as
follows. In 1993 translation of the English SCAN into Chinese (Mandarin) in
Taiwan was started by a group of native Taiwanese psychiatrists. In the
translation, Fukienese (the main native dialect in Taiwan) as well as Mandarin
phrases and terms were used. Group discussions then were held by the
translators, to compare in detail each section of the SCAN in English and
Chinese. Further improvements were made to the text, giving conceptual
equivalence priority over word-for-word linguistic equivalence. A
back-translation to English was then carried out by bilingual mental health
professionals who had not been trained in SCAN. This back-translation was
reviewed for distortions from the original document, and areas of uncertainty
or difficulty were discussed and resolved by the panel of psychiatrists.
Training of Chinese psychiatrists in SCAN and first adjustments to
the new version
A training course for 12 Taiwanese academic psychiatrists was conducted in
Taipei in 1996 in English by two experienced SCAN trainers according to the
established format developed by WHO. Eight of the Taiwan trainees formed a
local SCAN club and held monthly meetings to review the
translation and to test SCAN. Each psychiatrist contributed interviews that
were designed to determine whether patients understand the SCAN questions and
concepts. Based on these interviews, minor changes were made to the interview
and probes.
Preparation of interview videotapes
Videotape recordings of clinical interviews with 40 patients using the
Chinese SCAN 2.1 version (World Health
Organization, 1999) were made by the SCAN club members in Taiwan
for this study. These were regarded as a pool of videotapes from which a
manageable number of the best possible quality for translation and detailed
discussion could be selected. The patients were selected as being
representative of four areas of psychopathology neurotic, affective,
substance misuse or dependence, and psychotic and were intended to be
typical cases seen in hospital practice likely to provide ratings on a wide
variety of SCAN items. About half of the recordings were excluded because of
unsatisfactory technical quality (usually poor sound) or because of patients
who gave many vague replies or who were very talkative; some interviews were
excluded because of periods of unsatisfactory interviewing techniques.
Selection in this way was appropriate, because the main properties required of
the final group of interviews were simply that the clinical states of the
patients should be reasonably typical and that the interviews should not set
special problems for the translators. A final set of 16 Chinese videotapes was
retained, with four patients in each of the four areas of psychopathology.
Transcripts in Chinese were prepared from the videotapes by local bureaux. These were reviewed and revised by each interviewer to produce a final Chinese version. The Chinese transcripts then were sent to another bureau for English translation. The English transcripts were reviewed by a bilingual clinician and grammatical errors were corrected. English subtitles were added to copies of the videotapes and time codes then were matched back to the printed transcripts for reference use.
Rating of the videotapes
Ratings were carried out by two groups of psychiatrists. The native
English-speaking (US-UK) group consisted of one British, one
Irish and two American psychiatrists. All had had extensive experience
teaching and using SCAN in their own centres and also had conducted SCAN
courses in centres and countries other than their own. The native
Chinese-speaking (Taiwan) group consisted of six of the original
SCAN club group. It was decided that ratings would be based primarily on
transcripts but that excerpts of the videos would be viewed so that raters
could check for clinically relevant abnormalities of behaviour and gain a
general sense of each patient. A special meeting of the two groups of
psychiatrists, lasting 1 week, took place in Taipei in September 1998 to
complete the comparison.
Symptom ratings
Ratings of one interview that was sent ahead to all the psychiatrists were
compared first. Discrepancies were discussed but changes were not made to the
ratings. For the remainder of the cases, videotapes with subtitles were shown
to the group for 5-10 min, without discussion. Each rater then used the
printed transcript in his or her native language to rate the SCAN items. One
patient from each type of psychopathology was rated, and after the rating was
completed it was discussed item by item. Again, no changes were made in the
ratings. Subsequently, the remaining transcripts were rated without detailed
item-by-item discussion. If conceptual issues arose during the rating, they
were noted and presented after completion of the rating for group discussion.
All ratings were entered into a spreadsheet for later comparison and
quantitative analysis.
Clinical diagnosis
Although a subsidiary issue in this study, the raters recorded their
opinion about the likely clinical diagnosis, using ICD-10, Chapter V
(World Health Organization,
1993). Excellent agreement on the clinical diagnoses was found.
This aspect of the study is not described further here, but details may be
obtained from A.T.A.C. upon request (or seen at
http://www.mdlogix.com/id115.htm).
Data management
Ratings made on paper recording forms were entered into a spreadsheet and
the summarised numerical results within and across the two groups of raters
(US-UK and Taiwan) were available for group discussion shortly after all the
ratings were completed.
Items rated 8 or 9 (SCAN codes for uncertain and
missing) were treated as missing values in mean score analysis.
At the item level, a binary representation of the presence or absence of a
clinically significant symptom was of interest. This is based on the explicit
rules of SCAN interviewing in which 2 and 3 (in Part 1) and 1, 2 and 3 (in
Part 2) are considered clinically significant. The same data
transformation principles are used in diagnostic algorithms. Thus, the
definite presence of a symptom meeting the glossary definition would be
indicated by a value of 1, and anything less indicated by a value of 0.
Specifically, item scores using Rating Scale I in Part 1 of the SCAN were
transformed into dichotomous (0, 1) values by mapping 0 and 1 scores to 0 (0,
1 0) and 2 and 3 scores to 1 (2,3
1). In part 2 of the SCAN, items
using Rating Scale II were similarly converted into dichotomous values, except
that 1, 2 and 3 values were all mapped to 1.
Clinical equivalence and qualitative analyses
Following each rating session, discussion took place between the two groups
of interviewers and detailed notes of these qualitative findings were kept.
The purpose of these discussions was to enhance the clinical validity of the
Chinese SCAN symptom items cross-culturally by making sure that the Chinese
items addressed the same concept as the English. In addition, the possible
effects of cultural differences in social desirability upon the responses of
the patients to questions were discussed.
Quantitative analyses; interrater reliability
Owing to the small sample size, the statistical methods employed are
considered as being primarily descriptive. However, two hypotheses were
discussed and agreed upon at the start of the meeting, and these guided some
aspects of the data analyses:
Comparison of scaled item ratings
For each patient, single-item group means were calculated for the US-UK and
Taiwan groups of raters. A number of items from the various SCAN sections were
omitted because they were not rated at all or there were too few ratings to
make comparisons. Overall mean and standard deviations for items in each SCAN
section were calculated across patients within each diagnostic group (e.g.
Affective). The means and standard deviations were compared between the two
rater groups using paired two-tailed t-test statistics for individual
items and for items grouped by SCAN section.
Comparison of binary-transformed item ratings
Three methods were used to examine this aspect of items present or
absent:
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Second, in translating the item for depressed mood, the Chinese version also required slight modification. Just as English has several vernacular terms for depression (e.g. down in the dumps, blue, etc.), so does Chinese (e.g. fallen into the valley, heart not clear up, sour heart). Similarly, the English word guilt has no exact Chinese equivalent. The original SCAN includes blamed yourself and ashamed of yourself, which were judged to be relevant to the Chinese. In addition, loss of face, a common term in Chinese, was also incorporated into the Chinese SCAN.
Agreement was reached on similar modifications to a number of other items in order to improve the conceptual equivalence of the new Chinese version.
Social desirability set
Another conceptual area had to do with social desirability. For example, it
was thought to be difficult for subjects in Taiwan to answer the question,
"Would you say you were more calm and collected, less prone to
irritability, than most people?" People in Taiwan have difficulty in
answering a question worded in a positive manner because they are taught to be
humble. In the Taiwan version the polarity of the probes was reversed to avoid
this response bias problem. Thus the interview text reads "Would you say
that you are more prone to be nervous and tense than most people?" This
may also be a problem in countries such as Japan.
Similarly, in asking people in Taiwan about loss of enjoyment (anhedonia), the concept of enjoying life may not apply because there is a cultural bias against admitting that activities are enjoyed. The cultural concern is that such an admission may be interpreted as boasting. To deal with this point, the SCAN item was modified by substituting a list of personal activities, and the respondent is asked about any changes in the level of participation in them.
A complete list of the SCAN items and the associated issues that were discussed has been compiled and is available from A.T.A.C. upon request.
The SCAN item interrater reliability
Table 1 shows the
quantitative results for selected representative individual SCAN items that
differed significantly and the overall section summary ratings (grand
means) for both scaled ratings and binary-transformed ratings,
comparing the US-UK and Taiwan groups of raters. (Tables containing the larger
set of items can be obtained from A.T.A.C. upon request, or seen at
http://www.mdlogix.com/id115.htm)
|
Seven out of 12 sections differed significantly in the grand mean ratings for the section between the Taiwan and US-UK groups when using scaled data, but only three of these differences were significant when using binary ratings. For all the sections, the group percentage agreement between the Taiwan group and the US-UK group was good, ranging from a low of 69% for Section 8 to 100% for Section 11.
In summary, the Taiwan group rated some items in both Section 3 (worry and tension) and Section 4 (anxiety) higher than the US-UK group. The Taiwan group also rated some of the affective symptoms higher (Sections 6, 7 and 10). In Section 11 (alcohol) some scaled items were rated higher by the Taiwan group and some by the US-UK group, but there were no differences in Section 12 (other substances). In Sections 16, 17 and 18 (perceptual changes, auditory hallucinations and thought disorder) there were several items rated higher by the Taiwanese group, but none in Section 19 (delusions).
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Translation of symptoms
The emphasis on the equivalence of experiences of the patients at the level
of symptoms ensures as much cross-cultural equivalence as possible and also
avoids the potential problem that is sometimes called the category
fallacy. This term refers to the warning given by some medical and
social anthropologists that psychiatrists will be likely to arrive at
misleading conclusions if they automatically apply the diagnostic concepts
that they have become familiar with in their own culture to patients from
different cultures (Kleinman & Good,
1985). Cross-cultural studies in which PSE or SCAN have been used
(e.g. many of those coordinated by WHO) therefore minimise this problem.
It was apparent in the discussions that the major area in which the refinements were needed was not so much in the formal translation of the clinical definitions contained in the SCAN items and Glossary, but more in the vernacular terms used to operationalise communication of clinical concepts to the patients. It may be useful to divide the cross-cultural validity and equivalence issue into two parts: the terms, concepts and knowledge of languages needed to maximise communication between the professional research staff; and the terms and more vernacular style of language used to communicate these clinical concepts to the subjects.
Differences between Taiwanese and US-UK raters
Turning next to our quantitative findings, we found that although few
individual items had group ratings that were significantly different, there
were significant differences between items grouped by SCAN section. As
expected, the Taiwan scaled ratings of anxiety symptoms were higher than the
US-UK ratings. Although this difference disappeared for the section (Section
4) that covers panic and phobias when the ratings were transformed from scaled
SCAN ratings to binary presence or absence of clinically significant symptoms,
the difference persisted for non-specific symptoms (Section 3), consistent
with the a priori hypothesis. It is possible that the difference
could have an effect on studies of neurotic anxiety disorders.
On the other hand, the expected higher level of ratings of depressive symptoms by US-UK raters did not occur, and in fact the opposite result was observed, with the mean Taiwan rating for items in the mood sections (Sections 6, 7 and 10) being higher. This prediction was based upon the results of studies that found comparatively low rates for depressive disorders in Taiwan (Compton et al, 1991; Weissman et al, 1996). However, because the differences between the Taiwan and US-UK groups were decreased when the items were dichotomised, the effect on diagnosis of depressive disorders may not be large. As with Section 3 and neurotic anxiety disorders, this needs to be kept in mind in future studies.
The higher US-UK rating for alcohol use disorders (Section 11) was not expected and did not persist after the scaled ratings were dichotomised. As with the results for other sections, this result suggests that it is important to distinguish scaled ratings from the basic presence or absence of clinically significant symptoms. When diagnosis is the main focus of a study, binary ratings should be sufficient.
Within the psychotic symptoms, Section 18 (thought disorder) stood out as being rated significantly higher by the Taiwan group than by the US-UK group, with both scaled and binary ratings. In SCAN training, particular attention is paid to these items, emphasising the need to avoid false-positive ratings. The group discussions clearly indicated that the Taiwan group rated too liberally with these items, and based the decision too much upon the first reply of the patient to the structured probe rather than on the clinical judgement of the interviewers after further questions and answers (a possible contributing cause for this has been noted already). The fact that ratings of the other psychotic symptom sections (Sections 16, 17 and 19) by the Taiwan group were not significantly different from the ratings by the US-UK group is partial evidence for this possibility, because Section 18 items require a higher standard of clinical evidence than items in the other sections. Alternatively, it is possible that the US-UK group rated too low on the Section 18 items because of information being changed or lost in translation.
In summary, the cross-cultural differences between the two groups were lessened when items were converted from scaled ratings to group percentage agreements for binary ratings, which were generally high.
Issues for future SCAN development and training
Even though the first draft of the Chinese translation of SCAN with which
this study started had been the result of a great deal of careful translation
work, the novel methods used in this study showed that further improvements
were still possible. It seems that to capitalise on the efforts made in
standard SCAN training, follow-up training and intermittent joint discussions
can be of great value. Cross-training in different centres is stipulated by
the SCAN training material, but there has not been a positive requirement for
detailed follow-up monitoring of SCAN performance by local raters using the
native language of a new SCAN centre. To avoid drift of rating and some of the
problems encountered in this study, translation and reliability exercises
similar to the one described here are recommended. This should be balanced
with an awareness of the amount of work required, but for implementation of
the SCAN in a major language the work is justified.
![]() |
Clinical Implications and Limitations |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
LIMITATIONS
![]() |
ACKNOWLEDGMENTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Compton, W., Helzer, J., Hwu, H., et al (1991) New methods in cross-cultural psychiatry: psychiatric illness in Taiwan and the United States. American Journal of Psychiatry, 148, 1697-1704.[Abstract]
Easton, C., Meza, E., Mager, D., et al (1997) Test-retest reliability of the alcohol and drug use disorder sections of the Schedules for Clinical Assessment in Neuropsychiatry (SCAN). Drug and Alcohol Dependence, 47, 187-194.[CrossRef][Medline]
Kleinman, A. & Good, B. (1985) Culture and Depression: Studies in the Anthropology and Cross-Cultural Psychiatry of Affect and Disorder. Los Angeles, CA: University of California Press.
Sartorius, N. & Kukyen, W. (1993) Translation of health status instruments. In Quality of Life Assessments in Health Care Settings. Berlin: Springer-Verlag.
Weissman, M. M., Bland, R. C., Canino, G., et al (1996) Cross-national epidemiology of major depression and bipolar disorder. Journal of the American Medical Association, 276, 293-299.[Abstract]
Wing, J. K. (1996) SCAN and the PSE tradition. Social Psychiatry and Psychiatric Epidemiology, 31, 50-54.[Medline]
Wing, J. K., Babor, T., Brugha, T., et al (1990) SCAN: Schedules for Clinical Assessment in Neuropsychiatry. Archives of General Psychiatry, 47, 589-593.[Abstract]
Wing, J. K., Cooper, J. & Sartorius, N. (1974) Measurement and Classification of Psychiatric Symptoms. New York: Cambridge University Press.
Wing, J. K., Sartorius, N. & Der, J. (1998a) International field trials: SCAN-0. In Diagnosis and Clinical Measurement in Psychiatry: A Reference Manual for SCAN/PSE-10 (eds J. Wing, N. Sartorius & T. Ustun), pp. 86-109. Cambridge: Cambridge University Press.
Wing, J. K., Sartorius, N. & Ustun, T. B. (eds) (1998b) Diagnosis and Clinical Measurement in Psychiatry, a Reference Manual for SCAN. Cambridge: Cambridge University Press.
World Health Organization (1993) The ICD-10 Classification of Mental and Behavioural Disorders: Diagnostic Criteria for Research. Geneva: WHO.
World Health Organization (1994) SCAN: Schedules for Clinical Assessment in Neuropsychiatry. Geneva: WHO.
World Health Organization (1999) SCAN 2.1: Schedules for Clinical Assessment in Neuropsychiatry. Cambridge: Cambridge University Press.
Received for publication August 16, 2000. Revision received December 5, 2000. Accepted for publication December 12, 2000.