1 Joint Departments of Epidemiology and Biostatistics, and Occupational Health, Faculty of Medicine, McGill University, Montréal, Québec, Canada.
2 Department of Community Health Sciences, Faculty of Medicine, Sherbrooke University, Sherbrooke, Québec, Canada.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
bias (epidemiology); case-control studies; child; electromagnetic fields; epidemiologic methods; leukemia; recall; x-rays
Abbreviations: GA, geographic area in which people were concerned about an excess of acute lymphoblastic leukemia in childhood.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Recall bias is of particular concern in case-control studies in which questionnaire data are used; the disease has already occurred when exposure information is obtained from the subjects (or from a surrogate such as a parent), and exposure often took place a long time ago. Case-control studies looking at risk factors associated with perinatal outcomes or severe childhood diseases may be prone to this bias because the parents of the affected children may, for instance, search actively for an explanation of the disease or have an assumption about its underlying cause (3, 4
). On the other hand, under these circumstances, the delay between occurrence of exposure and incidence of disease may be much shorter than it is with chronic diseases in adults. Not many published empirical studies of recall bias have used an adequate standard to assess the validity of parental reporting (5
16
). Nevertheless, most indicate that recall is often imperfect but not very different between cases and controls.
We conducted a validation substudy to assess recall bias within the framework of a case-control study on risk factors for acute lymphoblastic leukemia in childhood, such as exposure to pesticides in and around the home (17). For circumstantial and feasibility reasons, we chose to assess the validity of two variables: reported distance of the home from a power line and reported radiographic examinations during pregnancy. Distance of a home to power lines is a surrogate for exposure to electromagnetic fields, a possible risk factor for childhood leukemia; diagnostic radiation during pregnancy, in particular that resulting from pelvimetry and abdominal radiographic examinations, is a recognized risk factor for childhood cancer. Of note, while the parent case-control study was being carried out, citizens in one of the geographic areas included in the study were very concerned about a perceived excess of childhood acute lymphoblastic leukemia that they attributed to proximity to power lines.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Validation substudy
Distance to power lines. The measured distance between a residence and power lines was available for houses on Montréal Island, one of the regions included in the parent study. Therefore, only those cases and controls who lived on Montréal Island and were part of the parent study were included in the analysis. There were 374 such subjects, and the distance to power lines was coded for 359 of them. The reported distance was compared with the measured distance only at time of diagnosis (or at ascertainment for controls).
The distance from a residence to power lines at the time of diagnosis was first determined by using the transportation and energy network map from Hydro-Québec, the provincial electric utility. The data were from 1987, and the distance was precise to 220400 m. However, the Ministry of Energy and Resources of the province has a map of the power lines that is precise up to 5 m (but contains no voltages). The addresses of study subjects (blinding case-control status) and MapInfo software (18) were used to estimate map coordinates and determine distance to the power lines. The modified transverse Mercator system (a geometric combination to represent the curved surface of the earth on a flat surface) was used to establish the coordinates. Since the geocoding process itself included a margin of error of 15 m, in the end the distances were estimated with a precision of about 20 m (15 m from the geocoding process plus 5 m from using the Ministry map). Distances were established to 69, 120, 230, 315, and 735 kV overhead power lines.
In the parent case-control study, the distance to power lines was measured by asking the following question: "Within a radius of 1 km (1,000 m) of your house, was there a high-voltage power line?" A trained interviewer administered the questionnaire by telephone, and this question was answered by the mother. If any overhead power line was measured within 1,000 m of the home and the question was answered yes, the report was considered correct; if there was no line within 1,000 m and the answer was yes, or the reverse, the report was considered incorrect.
We compared the sensitivity and specificity of the reported distance with the measured distance for the cases in the geographic area in which people were concerned about an excess of acute lymphoblastic leukemia in childhood (GA cases) (n = 32), for other Montréal Island cases (n = 95), for population controls (n = 110), and for hospital controls (n = 122). Confidence intervals for the difference between two independent proportions were estimated for sensitivity and specificity, and the difference was tested by using a z test (19). We estimated crude unmatched odds ratios and 95 percent confidence intervals to compare the effect from reported and measured distances. Pair matching from the parent study could not be retained in this analysis; cases from Montréal Island were not necessarily matched to controls from this region because a matching region that was broader than just Montréal Island was used.
Maternal radiographic examinations. The reported frequency of radiographic examinations during pregnancy was compared with information in the hospital medical record. In the parent case-control study, questions were asked about specific types of examinations or body regions (e.g., pelvimetry, abdominal radiographs) and referred to the pregnancy trimesters. Our substudy was carried out before the parent study was completed; the latter started in 1989 and included cases from 1980. The validation substudy of maternal radiographic examinations thus included cases for whom the delay between diagnosis (or ascertainment for controls) and maternal interview was longer than that for cases diagnosed from 1990 onward. We had estimated that about 100 records needed to be reviewed to reject a correlation of 0.6 between numbers of radiographic examinations and to uncover one of 0.8; however, we had no preliminary data to determine the likelihood of such estimates, so (because it was still feasible) we decided to approximately double the required number.
We first chose the hospitals in which women had delivered, on the assumption that medical record keeping and medical practice may differ between different types of hospitals. Two of the chosen hospitals were university affiliated and had over 600 beds each; two were general hospitals on the periphery of Montréal, one with 400600 beds and the other with 200400 beds; and two were out-of town general hospitals, one with 200400 beds and the other with 100200 beds. We then chose a similar proportion of subjects in each hospital to reach a total of 188. Medical records were reviewed for 60 mothers of cases, 59 mothers of population controls, and 69 mothers of hospital controls.
Data on reported medical radiographic examinations were available for the mothers of 57 cases and 57 population controls and for the mothers of 66 hospital controls. The number of reported radiographic examinations was compared with the number in the hospital medical record. For this comparison, a weighted kappa statistic was used (20). Because the number of subjects with different frequencies of radiographic examinations was small (a frequency of one was the most common apart from none), we mainly compared the reporting or no reporting of a radiographic examination with the presence or absence of such an examination in the hospital medical record, regardless of frequency. Pelvimetry and abdominal radiographic examinations were of primary interest, but such examinations of other parts of the body, as well as ultrasound examinations, also were recorded. Dental radiographic examinations were excluded. Sensitivity and specificity of reported examinations were computed and differences in proportions were contrasted for cases and controls. Odds ratios and 95 percent confidence intervals comparing cases with each of the two control groups were estimated for reported data and the medical record.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
As mentioned previously, the two variables were chosen for circumstantial and practical reasons. They are different and resulted in different observations, except for a uniform underreporting. Such underreporting has been observed previously, particularly when persons are asked to remember exposures such as use of many different drugs, events that occurred a long time ago, or details such as medication dose or a precise date (5, 9
, 11
, 21
). In the studies reviewed (5
16
), the recall period varied from a few months to 10 years but most often was within 2 years. In the present study, the recall period generally was many years; however, there seemed to be no obvious correlation between delay to interview and validity estimates for reporting. This observation is more directly applicable to reported distance than to radiographic examinations. For the former, we asked about the distance to power lines at diagnosis; for the latter, the recall period could have been even longer if the child was older at diagnosis.
What factors could have distorted recall of the distance to power lines? First, the difficulty associated with estimating a long distance, such as that between the residence and power lines, probably affected accuracy. More specifically, the significant differential recall observed for a group of cases living in an area in which an excess of disease was perceived and was attributed to proximity to power lines could have been influenced by publicity from local newspapers. Because this publicity occurred while the parent case-control study was being conducted, it may have induced erroneous recall of this particular variable. A similar situation was observed in the late 1980s in a northern California region in which a perceived excess of spontaneous abortions and cardiac malformations was attributed to contaminated water (22). Some evidence from various epidemiologic studies conducted during this period suggests that recall bias could have accounted for part or all of the association between the adverse pregnancy outcomes and consumption of tap water (14
, 23
). However, a later prospective cohort study confirmed such an association, making recall bias a less likely explanation (24
).
In addition, what specific factors could have influenced the poor recall of mothers with respect to prenatal radiographic examinations? Although contrary to expectation, a similar observation was made by Graham et al. (25) in a study on the relation between perinatal and postnatal irradiation and childhood leukemia. They found poor sensitivity for reporting of diagnostic radiographic examinations in a subgroup of 200 of 1,203 women; sensitivity was 20.4 percent when reporting was compared with physicians' and dentists' records and 33.5 percent in comparison with hospital records. The reasons for these observations are unclear. We can only speculate that mothers of children with cancer who underwent pelvimetry ignored the risk associated with the procedure or did not have a lingering concern, having accepted its risk given the expected benefit. Another hypothesis is that the mothers who had radiographic examinations other than pelvimetry, which constituted about 40 percent of all such examinations reported and most often were not abdominal, did not consider these tests harmful.
Exposure to ionizing radiation was mentioned in two other validation substudies, without further specification (i.e., therapeutic, diagnostic, or occupational) (9, 11
). Detailed results were not shown, but this exposure was not reported to be affected by recall bias. Finally, in our study, the data on ultrasound examinations might indicate that pelvimetries were confounded by ultrasound examinations; however, this conclusion could explain only part of the underreporting. As mentioned previously, many radiographic examinations were not abdominal and could hardly be confounded by ultrasound examinations. Nevertheless, these findings reinforce the need to define clearly for respondents the procedures referred to in our questions.
For both exposures examined in this study, the objective measure used was a "gold standard," which was not often the case in the published studies. Most authors used the medical record as a reference (510
, 12
, 15
); in three studies (11
, 14
, 16
), comparisons were made between repeated interviews. The hospital medical record generally can be considered a gold standard for variables such as tests, procedures, and certain diagnoses except when care is provided outside the hospital, but it is not as accurate for variables such as common, mild conditions; lifestyle factors; and over-the-counter drugs (12
, 15
). The general view is that the medical record, although imperfect, enables unbiased comparison between cases and controls when exposure data are recorded independently of outcome (10
, 12
). However, with a small number of subjects and a low-prevalence exposure or one that is less likely to be recorded, the validity of the comparison between cases and controls could be affected. In our study, we are confident that the measured distance to power lines was a gold standard, if there is one, but the hospital medical record may not have included all radiographic examinations if, for example, subjects visited the emergency room of another center for treatment of an accident.
Other empirical studies using an adequate standard have assessed the validity of parental reporting by using a case-control design; various childhood diseases (leukemia, autistic disease, and sudden infant death syndrome) (7, 12
, 16
), pregnancy outcomes (e.g., congenital malformations, low birth weight, preterm birth, spontaneous abortion) (5
, 6
, 8
11
, 13
14
15
), and exposure variables (e.g., drugs, lifestyle factors, chemicals, history of diseases, symptoms) were studied. Including those in the present study, more than 100 variables were evaluated, but, based on our assessment, only a few from five studies showed evidence of recall bias (10
, 11
, 14
, 15
).
Werler et al. (10) compared information on eight variables obtained from postpartum personal interviews to that from the medical records of 105 mothers of malformed infants and 165 mothers of nonmalformed infants. Specificity could not be assessed, but sensitivity was reported to be higher for two variables for the mothers of caseshistory of urinary tract infection during pregnancy and birth control after conception. The authors attributed the results for the latter variable to the publicized putative hazards of spermicides, oral contraceptives, and intrauterine devices during and around the time of the study. As to the results for urinary tract infection, no difference was found when mothers of less severe cases of malformation were compared with mothers of controls. Feldman et al. (11
) compared many variables, including drugs, chemicals, and lifestyle factors, between a personal interview conducted during pregnancy and a telephone interview conducted after pregnancy for 33 mothers who had an adverse pregnancy outcome and 112 mothers whose pregnancy outcome was normal. Recall bias was found for alcohol consumption only. However, certain factors may explain these results, such as the effect of counseling between the two interviews, comparison of personal with telephone interviews, and use of open-ended questions.
Delgado-Rodriguez et al. (15) compared information on 12 variables (diseases, drugs, lifestyle factors, and prenatal care) from personal interviews after delivery with that from the medical record for 169 mothers of low-birth-weight infants and 198 mothers whose pregnancy outcome was normal. Sensitivity was significantly lower for controls regarding hypertension during pregnancy, while specificity was significantly lower for controls regarding antianemic drugs and anemia. Weight gain was underreported by both groups but more so by controls. The difference between the odds ratio estimated from the medical record and the one estimated from the interview was statistically significant for anemia only. No further explanation as to context was provided for this result. Fenster et al. (14
) assessed the presence of recall bias in the association between tap water consumption and spontaneous abortion in a specific geographic area of California. They compared information obtained from a telephone interview after the case's spontaneous abortion (on average 24 weeks after the last menstrual cycle) with that from a telephone interview after the control gave birth (about 48 weeks after the last menstrual cycle) for 100 mothers who had a spontaneous abortion and for 200 controls. At the second interview, there was a shift toward reporting less consumption, more so for controls. In none of the other studies was recall bias present or sufficient to alter the risk estimates significantly (5
9
, 12
, 13
, 16
).
Overall, in the studies reviewed previously, inaccuracies were frequent but recall bias was rare and was found only for very specific exposures, such as when some of the following conditions were met: 1) when the public in a certain area perceived that an excess of a disease was attributed to a specific exposure (as in the present study for reported distance to power lines and in the study by Fenster et al. (14) for tap water consumption); 2) when an association was publicized (such as the putative hazard of birth control after conception reported in the study by Werler et al. (10
)); or 3) when the exposure was socially undesirable (such as alcohol consumption reported in the study by Feldman et al. (11
)). On the other hand, although smoking is considered a socially undesirable exposure (23
), validity of reporting was excellent for the habit (9
, 15
) although somewhat less so for the levels of smoking reported (11
).
An important issue raised by our data is related to choice of the control group. If parents are viewed as informers when a severe childhood disease or an adverse pregnancy outcome is being studied, it is reasonable to think that choosing a control group that includes parents of children affected by diseases of similar severity (and not associated with the exposure under study) may increase the probability of comparable accuracy of recall in comparison with population controls (4, 10
, 26
). However, little evidence is available from empirical studies on pregnancy outcome and childhood disease to support this choice, as all studies except that published by Mackenzie and Lippman (9
) selected exclusively healthy controls. These authors specifically designed a nested case-control study to assess the effect of recall bias for 39 exposure variables that might influence pregnancy outcome. Women were asked the same questions twice, early in pregnancy and after delivery. Two control groups were chosen, one composed of mothers of children with less serious disease than the cases and one of mothers of healthy babies. Overall, the validity of the information was similar for the three groups when the two questionnaires were compared. In the other studies that used healthy controls, the choice of this comparison group was adequate in most situations (except the ones mentioned previously), as recall bias was not found.
Our results showed similar reporting for the other cases (those not part of a disease cluster) and for hospital controls regarding distance to power lines, which supports the choice of hospital controls; however, for maternal radiographic examinations, reporting was more similar for cases and population controls than for cases and hospital controls. The latter data were relatively sparse, and estimates of effects with reported data were not very different when population versus hospital controls were used. Unfortunately, the only conclusion to which these data led is that achieving comparable accuracy of recall by using hospital controls depends on what is being measured and under what circumstances.
In summary, the need to reduce the potential for recall bias in a study that relies on subject reporting to assess exposure cannot be emphasized enough. Realistically and in many circumstances, however, the ability to avoid recall difficulties is severely limited. Most frequently, the major problem in case-control studies of parental reporting is not differential recall but nondifferential misclassification. Use of more than one type of measure, in the absence of a gold standard, could reduce this problem (27).
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank Marie-Claude Boivin for the geo-coding analysis and Dr. Howard Morrison for his initial suggestions.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|