Comparison of Telephone and Postal Survey Modes on Respiratory Symptoms and Risk Factors

Jan Brøgger1, Per Bakke1, Geir E. Eide2 and Amund Gulsvik1

1 Department of Thoracic Medicine, Institute of Medicine, University of Bergen, Bergen, Norway.
2 Centre for Clinical Research, Haukeland University Hospital, and Section for Medical Statistics, University of Bergen, Bergen, Norway.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
Little is known about the comparability of postal and telephone survey modes in epidemiology. A cross-sectional, population-based study (n = 25,000) of lung disease was performed in 1998–1999 in two regions of Norway. Initial surveying was done by postal questionnaire. A 1% random sample (n = 171) of previous postal responders were resurveyed by telephone or cellular contact. The response rate was 67% on the telephone/cellular interview. Fewer incomplete answers were given by telephone than by mail. A lower prevalence was found by telephone for morning cough and exposure to passive smoking at work or home. Reproducibility was high for asthma, hay fever, wheezing, and attacks of breathlessness. Moderate reproducibility was seen for symptoms of chronic bronchitis. Reproducibility was low for indoor and work environment, although it was high for early life factors. Concordance coefficients were high for all continuous measures such as height, body weight, and pack-years. The authors conclude that the comparability between the postal and the telephone survey modes was good. The telephone survey mode gave more complete information. Survey mode may have a moderate effect on study results, depending on the specific questions asked.

data collection; health surveys; interviews; postal service; telephone

Abbreviations: CI, confidence interval


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
Common data collection instruments in epidemiology are postal questionnaires and telephone interviews. The comparability of these two survey modes needs to be studied. First of all, different researchers in any given epidemiologic field may be using different survey methods, so comparability of results is desirable (1Go).

Second, response rates are falling (2Go), so that surveys using both survey modes may become more common (3Go, 4Go). For example, an initial postal survey may be followed by a telephone survey of nonresponders to the postal survey. This will increase response rates, thus reducing nonresponse bias. However, there is a risk of introducing another bias, if responses to one survey method are systematically different from those of the other survey method.

There are several ways of comparing mail and telephone survey methods (1Go). Some have randomized half the population to the postal method and the other half to the telephone method (5GoGo–7Go). Others have compared postal responders with a telephone survey of postal nonresponders (8GoGo–10Go). The population that you reach with each survey method may be different, so these studies are unable to separate the effect of the survey method itself from the nonresponse bias due to survey method. Few epidemiologic studies have directly compared responses from the same subjects when surveyed by both postal questionnaire and telephone interview (11Go, 12Go).

The aim of the present study was to investigate the comparability of postal and telephone survey modes, through a telephone reinterview of responders to a postal survey, with respect to prevalence estimates, completeness, and reproducibility.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
The design was a cross-sectional study of the adult population in two regions of Norway, the capital Oslo and the county of Hordaland. The sampling frame was inhabitants listed in the Central Population Register who were aged 15–70 years on December 31, 1997. Random samples of 20,000 (Oslo) and 5,000 (Hordaland) inhabitants were selected.


    Fieldwork
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
The study started with the mailing of a postal questionnaire. After 3 and 8 weeks a reminder letter was sent out. Four months after the beginning of the study, a telephone follow-up was initiated for a 1 percent random sample of previous postal responders (n = 171). Telephone numbers for the follow-up sample were located using a direct-mail firm (D. M. Huset, Oslo, Norway), matching on name, date of birth, and address. Six months after the beginning of the study, computer-assisted telephone interviews of these previous postal responders were performed by trained operators. A standardized interview protocol was followed. Nine months after the study began, cellular telephone numbers were located for those who did not have a land-line telephone. This was done through a search of the cellular phone companies' customer databases. A similar cellular telephone interview was performed.

All mailings included an addressed envelope and return postage. The mailing and processing of the questionnaires were performed by Statistics Norway, the Norwegian government's statistics and survey agency.

The sample size was chosen to provide reasonably precise estimates of reliability and to detect only large differences in prevalence. With 115 responders, a kappa of 0.70, and symmetric prevalences of 10 percent and 25 percent, the widths of the 95 percent confidence intervals would be 0.16 and 0.11, respectively. For a kappa of 0.90, they would be 0.10 and 0.07.


    Questionnaire
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
The symptom questions were taken from a modified translation of the questionnaire by the British Medical Research Council's committee on chronic bronchitis (13Go, 14Go), and they have been compared with a direct translation (15Go). A one-sheet questionnaire designed for self-completion was used. It contained 67 questions on respiratory symptoms, cardiorespiratory diagnoses, smoking habits, and risk factors. Thirty-five of these questions were meant to be answered by all participants. Three questions on heart disease and tuberculosis were not analyzed. Smokers were defined as daily smokers at the time of the study. Former smokers were persons who had smoked daily and had given it up. Pack-years were analyzed among smokers and former smokers only. The questionnaire used in the present study is available on the following Web site (http://www.brogger.no/oh98tlf/), as is an expanded version of the table, including all cell counts.

The study was approved by the regional ethics committee and the Norwegian Data Inspectorate.


    Statistical methods
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
For comparisons between the survey modes, prevalences and odds ratios with 95 percent confidence intervals were estimated. Agreement and Cohen's kappa (16Go, 17Go) were calculated for reproducibility. For ordinal data, Wilcoxon's signed rank test was used to compare distributions, and Cohen's unweighted kappa was computed for reproducibility. For continuous data, the confidence intervals for medians were computed and Lin's concordance coefficient (18Go) was computed for reproducibility. For height and weight, a normal distribution was assumed. Confidence intervals for the mean number of missing answers on each questionnaire were computed using the negative binomial distribution (19Go). A global test of the effect of survey mode on outcomes was performed. The difference between survey modes for each outcome was summed for each person. The median of this sum across all subjects was tested with Wilcoxon's signed rank test. The influence of age, gender, and region on the chance of postal response, having a telephone, and telephone response was assessed by logistic regression. Cases with missing answers with one or both survey modes were excluded from analyses involving that answer. All data were analyzed using Stata 7.0 software (20Go).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
The postal response rate was 68 percent, leaving 17,083 responders available for selection for the follow-up, from which 171 were randomly selected. A land-line telephone was located for 141 subjects (83 percent), and a cellular telephone was located for 15 of the 30 remaining subjects, giving an overall coverage of 91 percent. A telephone interview was achieved with 115 of the available subjects (74 percent). Of the 56 subjects for whom no interview was achieved, 15 did not have a land-line telephone or a cellular telephone, seven were not reached or had moved, 13 were reached but refused participation, and 21 had other or unknown reasons for nonresponse.


    Completeness
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
Of the 35 questions and one derived smoking habit variable, the mean number of incomplete questions was 2.8 (95 percent confidence interval (CI): 2.2, 3.6) with the postal survey and 0.6 (95 percent CI: 0.4, 0.8) with the telephone survey (p for the difference < 0.001). Few questions had any appreciable missing answers with the telephone survey mode.

In the postal survey, 17 questions had more than 5 percent missing answers. All symptoms had less than 5 percent missing answers except "episodes of phlegm or cough" (7 percent missing). The common diagnoses of asthma and bronchitis had 8 percent and 6 percent missing, while emphysema had 16 percent. Questions about eczema and hay fever had 28 and 31 percent missing, respectively. Other questions that had high rates of missingness were early life exposures (7–14 percent) and asthma in the family (14–23 percent).


    Prevalences and means
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
The largest significant difference in prevalence between survey modes was for morning cough (table 1). Relatively large differences were observed for questions on dyspnea and for indoor environment questions, but with no consistent pattern in any direction.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Telephone resurvey of previous postal respondents (n = 115), Oslo and Hordaland, Norway, 1998–1999*,{dagger}

 
Among ordinal variables, significantly less passive smoking at home was reported with the postal survey method, with any exposure at 37 percent in the telephone survey versus 22 percent in the postal survey (p = 0.01). Similarly, less passive smoking at work was reported in the postal survey, with any exposure at 55 percent versus 22 percent (p < 0.001). For smoking, education, and episodes of phlegm and cough, only small differences were observed (p > 0.4).

For pack-years, the number of siblings, and the number of siblings with asthma, there was no significant difference between survey modes (p > 0.15). For height and weight, there were small changes of little significance. The mean increase in height from postal to telephone survey mode was 0.35 cm (95 percent CI: -0.03, 0.73; p = 0.07). For weight, the mean decrease was 0.58 kg (95 percent CI: -1.19, 0.03; p = 0.06). The global test of tendency to report more symptoms or diagnoses with one survey mode was not significant (p = 0.57).


    Reproducibility indices
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
Total agreement was 79 percent or more for all binary (yes/no) questions, and the kappa ranged from 0.32 to 1 (table 1). Agreement was high (>90 percent), and the kappa was excellent at >0.7 for asthma, emphysema, hay fever, early life exposures, and asthma in the family, most of which had low or moderate prevalence. Variables with very low prevalences, such as dyspnea grades 3 and 4, fungus smell, and deformed floor, all had low kappas of <0.5 but high (>95 percent) agreement. Questions with both kappa and agreement low were morning and day cough, dyspnea grade 2, symptoms at work, and moisture at home. Other questions had intermediate kappas and agreements. For ordinal variables, the kappas were good for education (kappa, 0.80) and smoking (kappa, 0.84) and low for passive smoking (kappa, 0.40 at home; kappa, 0.33 at work) and episodes of cough or phlegm (kappa, 0.24).

For continuous variables, concordance coefficients were very high (>0.95 or better) for height, weight, and number of siblings; high for pack-years (0.82, 95 percent CI: 0.69, 0.94), and moderate for number of siblings with asthma (0.76, 95 percent CI: 0.68, 0.84).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 
We have shown that there were more missing answers using the postal survey mode than the telephone mode. Prevalences were similar for most questions, except for questions on cough, dyspnea, episodes of cough and phlegm, and passive smoking. Reproducibility was good for continuous variables, diagnoses, and asthma in the family but poor for dyspnea, cough, and indoor environment.

The methodological limitations of the present study are as follows. The study was not powered to find small differences in categorical outcomes. Hence, for the more rare symptoms, such as dyspnea grades 3 and 4, a large difference in prevalence was found but with large confidence intervals. There will be a substantial cost associated with discovering differences in the effect of survey method on outcomes with low prevalence. Similar considerations apply to the precision in kappa estimates.

Because of the time span between surveys, bias due to recall of previous answers should be unlikely. However, a real incidence or remission of symptoms may have influenced the results. Moreover, for questions that ask for events that may be remote, for example, ever wheeze, the event may be remembered at the first survey but forgotten at the second. Few data are available on 6–9 months' reliability and incidence and remission of respiratory symptoms.

Nonresponse bias is another problem. The responders to both survey modes represented only 45 percent of the target population (0.68 x 0.91 x 0.74). The response at different steps in the survey was variably dependent on age, gender, and region (data not shown). The response rates achieved here compare favorably with those of other surveys (21Go). A study of this type will always be susceptible to nonresponse bias, because of its sequential design. Even a 90 percent response at each stage (postal, telephone directory search, telephone response) would give a response rate of 72 percent.

Few epidemiologic studies have considered completeness of information between survey modes (item nonresponse). In our study, there was a clear decrease in missing information with the telephone survey mode, as has been found in a previous study (11Go). This may lead to differential non-response bias in comparisons between studies.

Reliability measurements for binary variables have limitations due to the scarcity of information contained in the 2 x 2 table from which they are derived (22Go). Thus, we chose to consider prevalence, kappa, and agreement.

For comparisons between different studies where one has used the postal mode and the other has used the telephone survey mode, our results are encouraging. Any differences in prevalence between studies are likely to be due to real differences, except for passive smoking, morning cough, and possibly questions on dyspnea. There appeared to be no systematic tendency for under- or overreporting of symptoms or diagnoses in the telephone survey relative to the postal survey. We speculate that the particularly large difference in morning cough between survey modes is related to its being the first question in the telephone interview. It is possible that respondents want to please the interviewer by providing positive answers, and this effect may be more pronounced for the first question. A social desirability bias may be involved for questions on passive smoking. Passive smoking may be seen as socially unwanted behavior, leading people to report less smoking to a telephone interviewer than on a relatively anonymous questionnaire. However, the evidence concerning desirability effects in general is equivocal (1Go).

For a mixed-mode health survey where postal and telephone surveys have been used, our results show a good comparability of survey methods. Agreement is good for most variables. For questions with a low prevalence, we encountered the problem that kappa is low while agreement is high (23Go), while for other rare exposures both kappa and agreement were high, even though prevalences were moderate to low for all these questions. Questions useful for studies of chronic obstructive pulmonary disease had low or intermediate reproducibilities (cough, phlegm, grades of dyspnea). Variables important to asthma and atopic disease epidemiology, such as asthma diagnosis and hay fever, had excellent reproducibilities, while wheezing and attacks of breathlessness were intermediate. Encouragingly, the questions on early life exposures and asthma in the family had high reproducibility, while indoor and work environment questions performed less well. Given the time spans involved, one would expect the opposite pattern. However, there were more missing answers to early life exposures than to environment questions, in line with expectations. In other words, early life exposures have good repeatability among those who feel they remember enough or are confident enough to answer, whereas environmental questions have a moderate repeatability all over.

For simple continuous and ordinal data such as height, weight, number of siblings, and education, comparability is excellent. Good repeatability of such factual questions is expected.

A recent study by Galobardes et al. (12Go) examines the impact of survey method on a respiratory health questionnaire, using a slightly different design. Their comparison of telephone versus postal survey modes involves subjects that were initial postal nonresponders but were then successfully surveyed by telephone and then attended a clinical examination where they completed a questionnaire. The only comparable question is wheezing. They report a kappa of 0.76 that is higher than ours. Another study of ad hoc clinical populations, using only a 2-week interval and the same method of administration, found a higher repeatability for wheeze questions (0.73–0.95 depending on study center) and a comparable repeatability for asthma diagnosis (24Go).

In conclusion, we found evidence of mode effects on completeness and prevalence. Symptoms of chronic bronchitis had moderate to low reproducibility, while questions relevant to asthma and atopic disease epidemiology had good reproducibility. Less morning cough and passive smoking at home and work were reported with the telephone survey mode. The survey mode may have a moderate effect on study results, depending on the specific questions asked.


    ACKNOWLEDGMENTS
 
The study was supported by the Norwegian Research Council.


    NOTES
 
Correspondence to Dr. Jan Brøgger, Department of Thoracic Medicine, Institute of Medicine, University of Bergen, N-5021 Bergen, Norway (e-mail: jan.brogger{at}med.uib.no).


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 Fieldwork
 Questionnaire
 Statistical methods
 RESULTS
 Completeness
 Prevalences and means
 Reproducibility indices
 DISCUSSION
 REFERENCES
 

  1. Groves RM. Response effects of the mode of data collection. In: Survey errors and survey costs. New York, NY: John Wiley & Sons, Inc, 1989:501–79.
  2. Hartge P, Cahil J. Field methods in epidemiology. In: Rothman KJ, Greenland S, eds. Modern epidemiology. Philadelphia, PA: Lippincott-Raven, 1998:163–99.
  3. Groves RM. Temporal change in response rates. In: Survey errors and survey costs. New York, NY: John Wiley & Sons, Inc, 1989:145–55.
  4. Hartge P. Raising response rates: getting to yes. (Editorial). Epidemiology 1999;10:105–7.[ISI][Medline]
  5. Helsing K, Comstock G, Speizer F, et al. Comparison of three standardized questionnaires on respiratory symptoms. Am Rev Respir Dis 1979;120:1221–31.[ISI][Medline]
  6. Siemiatycki J. A comparison of mail, telephone, and home interview strategies for household health surveys. Am J Public Health 1979;69:238–45.[Abstract]
  7. Mickey RM, Worden JK, Vacek PM, et al. Comparability of telephone and household breast cancer screening surveys with differing response rates. Epidemiology 1994;5:462–5.[ISI][Medline]
  8. Criqui MH, Barrett-Connor E, Austin M. Differences between respondents and non-respondents in a population-based cardiovascular disease study. Am J Epidemiol 1978;108:367–72.[Abstract]
  9. Brambilla DJ, McKinlay SM. A comparison of responses to mailed questionnaires and telephone interviews in a mixed mode health survey. Am J Epidemiol 1987;126:962–71.[Abstract]
  10. Hill A, Roberts J, Ewings P, et al. Non-response bias in a lifestyle survey. J Public Health Med 1997;19:203–7.[Abstract]
  11. O'Toole BI, Battistutta D, Long A, et al. A comparison of costs and data quality of three health survey methods: mail, telephone and personal home interview. Am J Epidemiol 1986;124:317–28.[Abstract]
  12. Galobardes B, Sunyer J, Anto JM, et al. Effect of the method of administration, mail or telephone, on the validity and reliability of a respiratory health questionnaire. The Spanish centers of the European Asthma Study. J Clin Epidemiol 1998;51:875–81.[ISI][Medline]
  13. Medical Research Council Committee on Research into Chronic Bronchitis. Standardized questionnaire on respiratory symptoms. Br Med J 1960;2:1665.[ISI]
  14. Gulsvik A. Prevalence and manifestations of obstructive lung disease in the city of Oslo. Scand J Respir Dis 1979;60:286–96.[ISI][Medline]
  15. Brogger J, Bakke PS, Gulsvik A. Comparison of two respiratory symptoms questionnaries. Int J Tuberc Lung Dis 2000;4:83–90.[ISI][Medline]
  16. Fleiss JL. The measurement of interrater agreement. In: Statistical methods for rates and proportions. New York, NY: Wiley, 1981:212–35.
  17. Donner A, Eliasziw M. A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing, and sample size estimation. Stat Med 1992;11:1511–19.[ISI][Medline]
  18. Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989;45:255–68.[ISI][Medline]
  19. Lawless JF. Negative binomial and mixed Poisson regression. Can J Stat 1987;15:209–25.[ISI]
  20. Stata Corporation. Stata statistical software: Release 7.0. College Station, TX: Stata Corporation, 2001.
  21. Asch DA, Christakis NA. Different response rates in a trial of two envelope styles in mail survey research. Epidemiology 1994;5:364–5.[ISI][Medline]
  22. Guggenmoos-Holzmann I. The meaning of kappa: probabilistic concepts of reliability and validity revisited. J Clin Epidemiol 1996;49:775–82.[ISI][Medline]
  23. Feinstein AR, Cicchetti DV. High agreement but low kappa. I. The problems of two paradoxes. J Clin Epidemiol 1990;43:543–9.[ISI][Medline]
  24. Burney P, Laitinen L, Perdrizet S, et al. Validity and repeatability of the IUATLD (1984) Bronchial Symptoms Questionnaire: an international comparison. Eur Respir J 1989;2:940–5.[Abstract]
Received for publication April 10, 2001. Accepted for publication September 25, 2001.