a Division of Rehabilitation and Ageing, University of Nottingham, Nottingham, UK.
b Trent Institute for Health Services Research, University of Nottingham, Nottingham, UK.
c The TOTAL Study Group. Nottingham: John Gladman, Avril Drummond, Michael Dewey, Nadina Lincoln, Chris Parker, Philippa Logan, Kate Radford; Aintree: Department of Medicine for the Elderly Research Team: Anil Sharma, Caroline Watkins, Michael Leathley, Hazel Dickinson, Elaine Mackie, Jan Rhodes, Liz Lightbody; OT service: Julie Murray, Geralyn Lennon, Carol Mullarkey, Hazel McCormick, Val Chisnall, Sean McCann; Bristol: Meg Birch, Helen Smith, Dave Gamm, Mary Vincent, Sally Chorlton, Dee Sessions; Edinburgh: Martin Dennis, Alison Chalmers, Lesley Moffat; Glasgow: Peter Langhorne, Joyce Peters, Karen Blackwood, Louise Gilbertson; Newcastle: David Barer, Susan Fall.
Reprint requests: Dr ME Dewey, Trent Institute for Health Services Research, Floor B Medical School, Queens Medical Centre, Nottingham NG7 2UH, UK. E-mail: michael.dewey{at}nottingham.ac.uk
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods Questionnaires containing a number of established scales were posted to participants in a trial of occupational therapy after stroke. Response was maximized by telephone and postal reminders, and incomplete questionnaires were followed up by telephone. Scale scores obtained by imputing values to questionnaire items missing on return were compared with those achieved by telephone follow-up.
Findings Response to the initial posting was 60%, rising to 85% after reminders. Participants receiving the experimental treatment were more likely to respond without a reminder. There were no significant differences on any known factors between eventual responders and non-responders. Of the questionnaires, 43% were incomplete on return: partial responders were significantly different to complete responders on baseline disability and home circumstances. Of the incomplete questionnaires, 71% were resolved by telephone follow-up. In these, the scale scores achieved by telephone were generally higher than those derived by conventional imputation.
Conclusion Postal outcome assessment achieved a good response rate, but considerable effort was needed to minimize non-response and incomplete response, both of which could have been serious sources of bias.
Keywords Research outcomes, bias, postal questionnaires, stroke therapy
Accepted 8 May 2000
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
There are also practical difficulties in conducting face-to-face interviews or assessments in the community. The costs can be substantial, particularly if the study population is geographically dispersed. Study participants who move out of the area may be lost to follow-up. Standardizing interview or assessment procedures across a large study requires extensive training at the outset and periodically thereafter, particularly in a lengthy study when there may be substantial staff turnover.
Measurement of outcomes by postal questionnaire offers both methodological and practical advantages over home visits. The potential bias introduced by interviewer or assessor is avoided, and the measurement procedure can be standardized across a large study. Respondents may answer sensitive questions more honestly in a self-completed questionnaire.5 Costs are considerably reduced,6 and provided that participants who move can be traced there is no additional difficulty or cost in obtaining their data. However, some outcomes are clearly unsuitable for postal assessment, and in others the data obtained by this method may be of poor quality. Response rates to postal questionnaires can be low, introducing bias if responders differ from non-responders.7 Returned questionnaires may contain missing, unclear or invalid item responses, making the calculation of scale scores problematic and again potentially introducing bias. Both non-response and incomplete response reduce the effective sample size, making it more likely that a true effect will not be detected.
The aim of this report is to describe the practical and methodological implications of a postal assessment process enhanced by telephone contact. The results presented here derive from a rehabilitation trial with a large multi-centre sample and a number of well-known outcome scales, and are relevant to planning future studies using postal assessment of outcome, both in rehabilitation research and more widely.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Outcome assessments were conducted by post from Nottingham 6 months and 12 months after random allocation: this paper presents data from the 6-month assessment. The follow-up procedure was conducted by the trial secretary who was masked to group allocation. Each centre provided details of patient deaths and changes of address. The patient follow-up questionnaire contained a number of scales with established postal use: (1) the Barthel ADL Index,10 a 10-item scale of self-care ADL ability; (2) The Nottingham Extended ADL scale (NEADL),11 a 22-item scale of instrumental ADL such as outdoor mobility, household tasks and leisure pursuits; (3) the Nottingham Leisure Questionnaire (NLQ),12,13 a 30-item scale of leisure activity; (4) the General Health Questionnaire, 12-item version (GHQ12),14 a widely used measure of emotional health; and (5) the London Handicap Scale (LHS),15 a 6-section scale of handicap.
The questionnaire was designed for ease of completion, with large font size and multiple choice tick boxes for each question. Any carer helping the patient to complete the questionnaire was asked to record their relationship to the patient. A carer questionnaire was also included, with items on the level of care provided and the GHQ12. A covering letter reminded patients of the purpose of the trial and emphasized the value of each participant's contribution.
If no response was received within 23 weeks an attempt was made to contact the patient by telephone, and a further questionnaire was posted if the original had been mislaid or if telephone contact was not achieved after repeated attempts. Non-responders were traced through computerized hospital systems and GP surgeries with the help of staff at each centre. Each patient received a maximum of one telephone reminder and one postal reminder. Any incomplete or ambiguous information in returned questionnaires was clarified by telephone if contact could be made. Items resolved by telephone were coded for data entry such that the nature of the problem (response missing or ambiguous) and the response achieved by telephone contact were captured.
Statistical methods
The relationship between response and known characteristics of participants was examined using logistic regression. The characteristics entered into the regression were those collected at baseline, such as age, sex and level of disability, as well as treatment group and whether a carer helped the patient to complete the questionnaire. Baseline age and Barthel score were treated as categorical variables to avoid constraining them to having a linear effect on the outcomes. The odds ratios (OR) presented are those derived from the logistic models and are adjusted for all other significant predictors of that outcome.
The conventional imputation strategy appropriate to the scale was used to compute scale scores when one or more item responses were missing. Thus for the Barthel, NEADL (scored 0,1,2,3) and NLQ (scored 0,1,2), missing was treated as never or not at all and counted zero towards the total score. For the GHQ12 (scored 0,1,2,3), missing was treated as same as usual and counted one towards the total. The scoring system for the LHS does not allow a total to be computed if any item is missing. Multiple responses to single-choice items were treated in the same way as those missing.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The only significant predictor of eventual return of the questionnaire was centre (Table 2). There was a non-significant trend towards lower response among patients with a lower baseline Barthel score (P = 0.14).
|
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Responders who needed reminders were different from those who did not, with patients in the leisure treatment group (the experimental treatment in this trial) being more likely to respond to the initial posting. Postal assessment without the capacity to remind non-responders would therefore have resulted in a higher proportion of followed-up patients in this group, raising the possibility of bias. Previous research has found differences between participants who respond readily and those needing more persistence,16 but the relationship between readiness to respond and treatment group has particular implications for trials in which masking of participants and clinicians is not possible, and may indicate a greater enthusiasm being generated by the new treatment. There is no evidence that the level of non-response remaining after reminders was a source of bias, as the probability of eventually returning a questionnaire was not significantly related to treatment group or to any of the baseline factors such as age or disability. There were, however, significant differences in response rates across centres, which was perhaps surprising when the follow-up procedure was conducted centrally. Possible explanations include differences between the populations and varying encouragement to respond by local therapists: if the latter was an important factor it would have implications for maximizing response in future multi-centre studies.
Almost half the returned questionnaires had some missing or ambiguous item responses; a much higher proportion than that typically found in questionnaire surveys of the general population.7 Incomplete responders differed significantly from complete responders: they were more likely to have had no help with completion, to live alone, and to have been at either end of the disability spectrum at baseline. Failure to use data from incomplete responders would therefore have presented another opportunity for bias to enter the study.
The outcome of the main study depended not on the individual item responses but on the scale scores derived from them. We compared two ways of calculating scale scores when one or more item responses were missing: the simple, conventional imputation strategy and telephone follow-up. Neither method represents a gold standard for filling in missing responses. Conventional imputation is a broad-brush technique making assumptions that are unlikely to be justified. There are of course more complex imputation techniques which are mainly used in large-scale population surveys.17 The response achieved by telephone may not have accorded perfectly with the one which would have been entered on the questionnaire. The method of administration has been found to influence response to questions, for example in depression scales with similar items to the GHQ.18 The task of resolving incomplete questionnaires by telephone was large in terms of the number of patients to contact but was usually limited to a small number of items per questionnaire. Patients were generally willing and able to answer items by telephone, although other studies have identified problems in conducting telephone interviews with elderly respondents.19
We found that conventional imputation underestimated scale scores measuring independence in activities of daily living, involvement in leisure activities, and emotional ill-health, relative to telephone contact. As the number of problem items per scale was typically small the resulting difference in scores was also on average small, but for some scales it affected a significant proportion of the followed-up sample. The analysis of the main trial used the scores achieved by telephone as the least imperfect of the two available measures.
In summary, postal questionnaires offer a practical method of collecting research outcomes, avoiding the potential bias related to interviewer or independent assessor, but should not be considered an option requiring little effort. Failure to invest in the administration time needed to maximize the quantity and quality of information obtained could replace one source of bias by others which are equally serious.
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Sackett DL. Bias in analytic research. J Chron Dis 1979;32:5163.[ISI][Medline]
3 Siemonsma PC, Walker MF. Practical guidelines for independent assessment in randomised controlled trials of rehabilitation. Clin Rehab 1997;11:27379.[ISI][Medline]
4 Noseworthy JH, Ebers GC, Vandervoort MK, Farquhar RE, Yetisir E, Roberts R. The impact of blinding on the results of a randomised, placebo-controlled multiple sclerosis trial. Neurology 1994; 44:1620.[Abstract]
5 Schwartz N, Strack F, Hippler HJ, Bishop G. The impact of administration mode on response effects in survey measurement. Appl Cognit Psychol 1991;5:193212.[ISI]
6 Siemiatycki J. A comparison of mail, telephone, and home interview strategies for household health surveys. Am J Public Health 1979; 69:23845.[Abstract]
7 Foster K. Evaluating Non-response on Household Surveys. Government Statistical Service Methodology Series No. 8. London: Office for National Statistics, 1998.
8 Parker CJ, Gladman JG, Drummond AER et al. on behalf of the TOTAL study group. A multi-centre randomised controlled trial of leisure therapy and conventional occupational therapy after stroke. Submitted.
9 Van Swieten JC, Koudstaal PJ, Visser MC, Schouten HJA, Van Gijn J. Interobserver agreement for the assessment of handicap in stroke patients. Stroke 1988;19:60407.[Abstract]
10 Mahoney FI, Barthel DW. Functional evaluation: the Barthel Index. Maryland State Med J 1965;14:66165.
11 Nouri FM, Lincoln NB. An extended activities of daily living scale for stroke patients. Clin Rehab 1987;1:30105.
12 Drummond AER, Walker MF. The Nottingham Leisure Questionnaire for stroke patients. Br J Occup Ther 1994;57:41418.
13 Parker CJ, Logan PA, Gladman JRF, Drummond AER. A shortened version of the Nottingham Leisure Questionnaire. Clin Rehab 1997; 11:26768.
14 Goldberg D, Williams P. A User's Guide to the General Health Questionnaire. Windsor: NFER-NELSON, 1992.
15 Harwood R, Gompertz P, Ebrahim S. Handicap one year after stroke: validity of a new scale. J Neurol 1994;57:82529.
16 Rao PSRS. Callbacks, follow-ups, and repeated telephone calls. In: Madow WG, Olkin I, Rubin DB (eds). Incomplete Data in Sample Surveys. Vol. 2: Theory and Bibliographies. New York: Academic Press, 1983, pp.3344.
17 Report of the Task Force on Imputation. Government Statistical Service Methodology Series No. 3. London: Office for National Statistics, 1996.
18 Geerlings SW, Beekman ATF, Deeg DJH, Van Tilurg W, Smit JH. The Center for Epidemiologic Studies Depression scale (CES-D) in a mixed-mode repeated measurements design: sex and age effects in older adults. Int J Meth Psychiatr Res 1999;8:10209.
19 Wilson K, Roe B. Interviewing older people by telephone following initial contact by postal survey. J Adv Nurs 1998;27:57581.[ISI][Medline]