Control Selection Strategies in Case-Control Studies of Childhood Diseases

Xiaomei Ma1 , Patricia A. Buffler2, Michael Layefsky3, Monique B. Does2 and Peggy Reynolds4

1 Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT.
2 Division of Public Health Biology and Epidemiology, University of California, Berkeley, CA.
3 Public Health Institute, Oakland, CA.
4 Environmental Health Investigations Branch, California Department of Health Services, Oakland, CA.

Received for publication October 2, 2003; accepted for publication December 17, 2003.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
To address concerns regarding the representativeness of controls in case-control studies, two selection strategies were evaluated in a study of childhood leukemia, which commenced in California in 1995. The authors selected two controls per case: one from among children identified by using computerized birth records and located successfully, the other from a roster of friends; both were matched on demographic factors. Sixty-four birth certificate–friend control pairs were enrolled (n = 128). Additionally, 192 "ideal" controls were selected without tracing from the birth records. Data on parental ages, parental education, mother’s reproductive history, and birth weight were obtained from the birth certificates of all 320 subjects. For all variables except birth weight, the differences between birth certificate and ideal controls were smaller than those between friend and ideal controls. None of the differences between birth certificate and ideal controls was significant, whereas two factors were significantly different between friend and ideal controls. These findings suggest that friend controls may be less representative than birth certificate controls. Despite difficulty in tracing and a seemingly low participation rate (49.0% for 560 enrolled birth certificate controls), using birth records to recruit controls appears to provide a representative sample of children and an opportunity to assess the representativeness of controls.

case-control studies; child; epidemiologic methods; leukemia

Abbreviations: Abbreviations: NCCLS, Northern California Childhood Leukemia Study; RDD, random digit dialing.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Editor’s note: An invited commentary on this article is published on page 922, and the authors’ response is on page 925.

In case-control studies, selection of an appropriate control group is critical because study conclusions are based on a comparison of the exposure histories provided by cases and controls. Three principles of comparability between cases and controls have been suggested: 1) all comparisons should be made within the study base, 2) comparisons of the effects of exposure levels on disease risk should not be distorted by the effects of other factors, and 3) any errors in measuring exposure should be nondifferential between cases and controls (1). Various types of controls, such as population-based controls, hospital controls, neighborhood controls, and friend controls, have different strengths and limitations. However, few studies have generated empirical data to evaluate different strategies for control selection in adults (2, 3) or children (46).

During an early stage of the Northern California Childhood Leukemia Study (NCCLS), we randomly selected two controls for each case, one from computerized California birth records, the other from a roster of friends provided by families of cases. To evaluate the representativeness of controls resulting from these two different control selection strategies, we compared selected characteristics of these two control groups with those of a third control group comprised of "ideal" population-based controls who would have been obtained under optimal circumstances. The results of this evaluation are presented in this paper. Because the use of birth records for population-based control selection is gaining importance for studies of disease risks in US children, we also describe our experience with recruiting 560 birth certificate controls.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The NCCLS commenced in 1995 and is currently ongoing. The study is composed of three phases. Phase 1 includes cases diagnosed between August 1995 and November 1999, while phases 2 and 3 include cases diagnosed after December 1, 1999. Incident cases of newly diagnosed childhood leukemia (in children aged 0–14 years) are prospectively ascertained (usually within 72 hours after diagnosis) from major pediatric clinical centers in Northern and Central California. Although case ascertainment is hospital based, a comparison with all population-based cases ascertained by the statewide California Cancer Registry (1997–1999) shows that the NCCLS protocol successfully identified 88 percent of all age-eligible, newly diagnosed childhood leukemia cases among residents of the San Francisco–Oakland Metropolitan Statistical Area over this 3-year period.

During the first few years of phase 1, two controls were randomly selected for each case: one from the statewide electronic birth files maintained by the Center for Health Statistics of the California Department of Health Services (birth certificate control), the other from a roster of friends nominated by the family of the case (friend control). Both controls were individually matched to cases on age (date of birth for birth certificate controls and ±1 year for friend controls), gender, Hispanic status (a child is considered Hispanic if either parent is Hispanic), maternal county of residence at the time of the index child’s birth, and maternal race (White, African American, or other). To be eligible, each case or control had to 1) reside in the study area, 2) be less than 15 years of age at the reference date (time of diagnosis for cases and the corresponding date for matched controls), 3) have at least one parent or guardian who speaks English or Spanish, and 4) have no previous history of any malignancy. Friend controls were also screened to make certain that they were not blood relatives of the case. If the family of the case could not nominate a list of friends of exactly the same age, gender, race, and Hispanic status, the matching criteria were relaxed.

Statewide birth records have been computerized in California since 1960. The record for each case born in California was identified and was used to select four potential controls. In phase 1, birth certificate controls were matched to cases born in the study area of 17 Bay Area counties on date of birth, gender, Hispanic status (either parent Hispanic), maternal race, and maternal county of residence, as listed on the birth certificate. The procedure was identical for cases born in California counties that are not part of the NCCLS study area (5 percent of all cases), except that county of residence at diagnosis was used for matching. For cases born outside of California (7 percent of cases), information on maternal race and Hispanic status was assessed by hospital staff rather than from the birth certificate, and the county of residence at diagnosis was used for matching.

The birth certificates for the four potential birth certificate controls and the case (if born in California) were obtained from the Center for Health Statistics. One of the four birth certificate controls was randomly selected as the first potential control to be recruited for the study. Using personal identifiers from the birth certificate, such as names and addresses, we located potential controls by using reverse directories and commercially available Internet-based search tools. Professional interviewers contacted each family using standardized searching protocols (e.g., calling a set number of times during prescribed hours). In some instances, relatives or neighbors were contacted in an effort to reach a family. If telephone contact was unsuccessful and there was a likely current address, contact was made by a letter and/or home visit. If the first-choice control could not be located, was ineligible, or declined to participate, the next randomly selected potential control was pursued. This procedure was repeated until an eligible and consenting birth certificate control was enrolled in the study. If no control was enrolled by using the first set of four birth certificates, additional certificates were requested from the Center for Health Statistics and the process described above was repeated.

The families of the cases were each asked to nominate three children who met the matching and eligibility criteria listed above and whose parents might be willing to participate in the study. The parents of one nominee were randomly selected to be contacted by telephone to introduce the study and further determine the eligibility of the potential control. If the potential friend control did not meet the criteria or refused to participate in the study, a second of the three friends was randomly selected and the protocol repeated.

Evaluation of the two different control selection strategies
The use of birth records to recruit population-based controls was not common in the 1990s. It was assumed, but unknown, that birth certificate controls enrolled in the study would be representative of the population base from which the cases arose. Furthermore, contrary to some assumptions (7), recruitment of friend controls in our study posed many unexpected logistical problems. Consequently, in 1999, we evaluated the methodological aspects of the two control selection strategies. By that time, both a birth certificate control and a friend control had been selected and interviewed for 64 cases. In addition, we used birth records to randomly select 192 "ideal" controls from among children who had been born in the 17-county study area, matched 3:1 to the 64 cases on date of birth, gender, Hispanic status, and maternal race. These controls were not matched on county of residence; beginning with phase 2 (late 1999), county of residence has not been used as a matching factor because of concerns about overmatching. These controls were considered ideal because they were exactly population based, did not need to be traced, and therefore were not subject to attrition because of the inability to locate them. Most matching and eligibility criteria could be directly assessed for these children (i.e., birth date, race, Hispanic status, birth residence), although some eligibility criteria could not (current residence, previous malignancy, spoken language of the parents). In addition, these controls did not need to be contacted, which precluded the possibility of refusals. For the purposes of this evaluation, all ideal controls were assumed eligible.

Data on parental ages, birth weight, total number of livebirths (including the newborn), total number of previous pregnancy losses (including spontaneous abortions and stillbirths), and time since last livebirth were obtained directly from the birth certificates for all 320 subjects (64 birth certificate controls, 64 friend controls, and 192 ideal controls). These variables were selected because they were recorded consistently on birth certificates (withstanding changes to the format of the California birth certificate over the last two decades) and some previous studies reported associations of these factors with the risk of childhood leukemia (811). For subjects born after 1988, information was also available for parental years of education. Enrolled birth certificate controls and friend controls were compared with the ideal controls. Of the eight variables reviewed, maternal age and paternal and maternal years of education appeared to have an approximately normal distribution. Student’s t test was used to compare those three variables, while the nonparametric Wilcoxon rank-sum test was used to compare the other five variables. The eight variables were also converted into various categories, a format that is more conventional in epidemiologic studies but often reduces statistical efficiency. The chi-square test was used to assess whether the distributions of these variables were the same between different control groups.

Recruitment of population-based controls using the computerized California birth files
As of April 2003, from a total of 1,489 potential controls who had been considered, 560 birth certificate controls were enrolled in the NCCLS. Details of the various search scenarios were recorded, and the participation rate was calculated and reported (figure 1). The number of potential controls considered for each eligible and consenting case was also documented. In addition, we recorded the number of searches conducted for birth certificate controls in various age groups (0–4, 5–9, and 10–14 years) and different ethnic categories (Hispanic and non-Hispanic).



View larger version (18K):
[in this window]
[in a new window]
 
FIGURE 1. Selection of controls in the Northern California Childhood Leukemia Study, August 1995–April 2003. * The authors assumed that the same percentage would be eligible as for potential controls who were found and whose eligibility to participate in the study was assessed.

 

    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
For all variables except birth weight, the differences between birth certificate and ideal controls were smaller than those between friend and ideal controls (table 1). None of the differences between birth certificate and ideal controls reached statistical significance. The p values for the differences between friend and ideal controls were 0.11 for maternal age, 0.33 for paternal age, 0.23 for maternal education, 0.17 for paternal education, 0.62 for birth weight, 0.18 for total number of livebirths, 0.04 for total number of previous pregnancy losses (including spontaneous abortions and stillbirths), and 0.03 for number of months since last livebirth (table 1). Total number of previous pregnancy losses and time since last livebirth, two variables indicated as risk factors for childhood leukemia in some studies (8, 9), were significantly different between the friend and ideal controls but not between the population-based birth certificate and ideal controls. When the variables were made categorical, the results of chi-square tests depicted a similar but less clear-cut picture, reflecting the loss of precision with categorical data (table 2). The differences between friend and ideal controls are generally larger than those between birth certificate and ideal controls.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Comparison of selected characteristics of birth certificate and friend controls with those of ideal population-based controls in the Northern California Childhood Leukemia Study, 1995–1999
 

View this table:
[in this window]
[in a new window]
 
TABLE 2. Distribution of selected birth characteristics among different controls in the Northern California Childhood Leukemia Study, 1995–1999
 
In addition to the methodological implications outlined, recruitment of friend controls was challenging because many families of cases (already overwhelmed by the diagnosis and treatment) were uncomfortable and reluctant to burden their friends. This problem was especially apparent with the families of Hispanic cases. Culturally, the Hispanic population in the study area is often less inclined than other populations to openly discuss a child’s serious disease with friends. Consequently, we decided to discontinue recruiting friend controls and to exclude from the main study analysis data obtained by interviewing friend controls.

Recruitment of birth certificate controls has continued since the study began in 1995. As of April 2003, the overall participation rate of birth certificate controls in the NCCLS was 49.0 percent—the number of controls who had been enrolled (n = 560) divided by the number of birth certificate controls sought, excluding the confirmed and presumed ineligibles (n = 1,142) (figure 1). The participation rate for controls in the NCCLS increased from 45.1 percent in phase 1 (1995–1999) to 51.6 percent in phases 2 and 3 (1999–present) after the searching techniques were refined to develop and implement more culturally sensitive protocols. The number of searches conducted for each enrolled birth certificate control ranged from one to 16, with an average of 2.66 (2.78 per Hispanic control and 2.58 per non-Hispanic control). The average numbers of searches conducted for enrolled controls in the age groups 0–4 years, 5–9 years, and 10–14 years were 2.38, 3.14, and 2.85, respectively.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
This evaluation suggests that friend controls are likely to be less representative than birth certificate controls of the population base from which the cases arose and that inclusion of friend controls may distort exposure-disease association. On the other hand, it was reassuring to find that birth certificate controls selected by using the NCCLS protocol were comparable to the study base population as estimated from ideal controls who would have been selected under optimal circumstances. Given our experience in using birth records to recruit population-based controls less than 15 years of age in California, we find that this method is feasible, although considerable resources are required to trace potential controls. Even though the participation rates obtained may be low, there is less evidence of biased sampling, and these controls are less likely than controls obtained by other methods to introduce misleading conclusions.

Many case-control studies of childhood malignancies conducted in the United States have used random digit dialing (RDD) to recruit controls (1215). Although RDD has certain advantages (16), evidence is growing that using RDD for control selection may result in a control group biased with respect to socioeconomic status, residential stability, and number of siblings, population characteristics that may be related to the probability of a variety of exposures (12, 17, 18). Consequently, we decided not to use RDD when the NCCLS began in 1995. Recent experience has confirmed our decision (19).

During the first few years of the NCCLS, friend controls were chosen in preference to RDD controls, and we sought to evaluate the feasibility of using birth certificate and friend controls. At the outset of the study in 1995, it was unclear whether it would be possible to use birth records to enroll population-based controls, and the NCCLS was one of the first studies of childhood malignancy in the United States to use such a method. Paradoxically, at that time, it was believed that friend controls would be relatively convenient and less expensive to recruit (7). This evaluation of control selection strategies using population-based "ideal" controls indicates that, in addition to logistic difficulties (especially with our Hispanic study population), friend controls are less useful, a finding consistent with earlier assessments (20, 21). The total number of previous pregnancy losses and time since last livebirth were significantly different between the friend and ideal controls but not between the population-based birth certificate and ideal controls. Had friend controls been used as the comparison group in future case-control analyses, spurious associations between these two factors and disease risk might have been generated.

Selection of population controls from a primary base ensures that the controls are drawn from the same population as the case series (7), a major methodological advantage. The practical challenge, however, is that there is usually no readily available population roster in the United States, as opposed to many European countries. In the NCCLS, we consider birth records preferred sources of controls since we conceptualize a population base consisting of children who were born in the 17-county study area and were residents of the study area at the reference date. However, a relatively small percentage (12 percent) of cases was not born in the study area (approximately 5 percent were born in other counties in California and 7 percent outside of California). To make the population base for cases and controls comparable, these cases would need to be excluded. In all NCCLS data analyses conducted to date, we have consistently compared the results for all cases with the results for cases born in the study area. Although no significant differences have been identified, we will continue to analyze the data in this manner. If any discrepancies are noted in future analyses, we plan to report results derived from analysis of cases born in the study area as well as all cases.

The "ideal" controls included in the evaluation are less than ideal in some aspects; some may not be eligible for the study because they have moved out of the study area, do not have an English- or Spanish-speaking parent, or had a malignancy (which would be extremely rare). These aspects would be problematic if there are significant differences regarding the exposure to leukemia risk factors between potential controls who moved out of the study area and those who did not, or between potential controls who have English- or Spanish-speaking parents and those who do not. These aspects represent an inherent limitation of this approach to control selection. On the other hand, the fact that these potential controls did not need to be traced and there was no possibility of refusals still provides a useful basis for this methodological evaluation.

Participation rate is sometimes considered an indicator of the potential for selection bias. The participation rate for birth certificate controls in the NCCLS is comparable to the 48.8 percent rate reported in a recently published study of childhood leukemia in New York State that used birth certificate controls (22). Furthermore, the participation rate for controls in the NCCLS is higher than the rate reported in a recent study with a similar design, which was conducted in California with cases of sudden infant death syndrome reported during 1997–2000 (41.3 percent) (23). Although the participation rates for birth certificate controls appear low, it is important to note that participation rate is not the most relevant consideration for inference purposes. The representativeness of the participating controls or comparability to the underlying study population is far more important. Lower participation rates do not necessarily limit inferences from a study as long as the participating controls reflect the exposure distribution of the source population of the cases. Moreover, data available for calculating participation rates is not uniform across different studies, and, because of considerable missing data, it is sometimes impossible to calculate informative participation rates. For example, the true denominator necessary to calculate participation rates for RDD controls usually cannot be determined because there is often no information about the people who screen calls, do not answer, hang up immediately, or refuse to talk to the caller.

The average number of searches that needed to be conducted to enroll a birth certificate control was somewhat higher for Hispanic children than for non-Hispanic children. Age also appeared to affect the number of searches. Fewer searches were needed for each younger birth certificate control (aged 0–4 years) than for an older one (aged 5–14 years).

Using birth certificates to select controls can be problematic if exposure or outcome variables are associated with the probability of a potential control being located. It will be important to compare the birth characteristics of potential controls who are successfully traced with the birth characteristics of those who are not located to determine whether they are systematically different. We are pursuing these additional analyses.

In summary, our experience in California indicates that friend controls may not be representative of the study population and that there may be systematic differences by ethnic group in analyses in which friend controls are used. However, we did observe that it is feasible to use birth records to successfully recruit population-based controls, despite the efforts needed to trace potential controls and the inability to locate a significant proportion of them (19.5 percent). The use of birth records, if applied appropriately, can serve as an important alternative to available control selection strategies for case-control studies of childhood diseases.


    ACKNOWLEDGMENTS
 
This work was supported by two research grants from the National Institute of Environmental Health Sciences (PS42 ES04705 and R01 ES09137).

The ideas and opinions expressed are those of the authors, and no endorsement by the California Department of Health Services should be inferred.

The authors thank clinical investigators at the collaborating hospitals for help in recruiting patients. They also thank the California Department of Health Services, Center for Health Statistics, and the California Cancer Registry for providing data used in these analyses.


    NOTES
 
Correspondence to Dr. Xiaomei Ma, Department of Epidemiology and Public Health, Yale University School of Medicine, 60 College Street, New Haven, CT 06520-8034 (e-mail: xiaomei.ma{at}yale.edu). Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Wacholder S, McLaughlin JK, Silverman DT, et al. Selection of controls in case-control studies. I. Principles. Am J Epidemiol 1992;135:1019–28.[Abstract]
  2. Ben-Shlomo Y, Markowe H, Shipley M, et al. Stroke risk from alcohol consumption using different control groups. Stroke 1992;23:1093–8.[Abstract]
  3. Moritz DJ, Kelsey JL, Grisso JA. Hospital controls versus community controls: differences in inferences regarding risk factors for hip fracture. Am J Epidemiol 1997;145:653–60.[Abstract]
  4. Lieff S, Olshan AF, Werler M, et al. Selection bias and the use of controls with malformations in case-control studies of birth defects. Epidemiology 1999;10:238–41.[ISI][Medline]
  5. Infante-Rivard C, Jacques L. Empirical study of parental recall bias. Am J Epidemiol 2000;152:480–6.[Abstract/Free Full Text]
  6. Infante-Rivard C. Hospital or population controls for case- control studies of severe childhood diseases? Am J Epidemiol 2003;157:176–82.[Abstract/Free Full Text]
  7. Wacholder S, Silverman DT, McLaughlin JK, et al. Selection of controls in case-control studies. II. Types of controls. Am J Epidemiol 1992;135:1029–41.[Abstract]
  8. Kaye SA, Robison LL, Smithson WA, et al. Maternal reproductive history and birth characteristics in childhood acute lymphoblastic leukemia. Cancer 1991;68:1351–5.[ISI][Medline]
  9. Yeazel M, Buckley J, Woods W, et al. History of maternal fetal loss and increased risk of childhood acute leukemia at an early age. Cancer 1995;75:1718–27.[ISI][Medline]
  10. Ross JA, Potter JD, Shu XO, et al. Evaluating the relationships among maternal reproductive history, birth characteristics, and infant leukemia: a report from the Children’s Cancer Group. Ann Epidemiol 1997;7:172–9.[CrossRef][ISI][Medline]
  11. Yeazel M, Ross J, Buckley J, et al. High birth weight and risk of specific childhood cancers: a report from the Children’s Cancer Group. J Pediatr 1997;131:671–7.[ISI][Medline]
  12. Savitz DA, Wachtel H, Barnes FA, et al. Case-control study of childhood cancer and exposure to 60-Hz magnetic fields. Am J Epidemiol 1988;128:21–38.[Abstract]
  13. Buckley JD, Robison LL, Swotinsky R, et al. Occupational exposures of parents of children with nonlymphocytic leukemia: a report from the Childrens Cancer Study Group. Cancer Res 1989;49:4030–7.[Abstract]
  14. Peters J, Preston-Martin S, London SJ, et al. Processed meats and risk of childhood leukemia: California, USA. Cancer Causes Control 1994;5:195–202.[ISI][Medline]
  15. Shu XO, Ross J, Pendergrass T, et al. Paternal alcohol consumption, cigarette smoking, and risk of infant leukemia: a Childrens Cancer Group Study. J Natl Cancer Inst 1996;88:24–31.[Abstract/Free Full Text]
  16. Sakkinen PA, Severson RK, Ross JA, et al. Random-digit dialing for control selection in childhood cancer studies: the geographic proximity and demographics within matched sets. Am J Public Health 1995;85:555–7.[Abstract]
  17. Greenberg ER. Random digit dialing for control selection: a review and a caution on its use in studies of childhood cancer. Am J Epidemiol 1990;131:1–5.[Abstract]
  18. London SJ, Thomas DC, Bowman JD, et al. Exposure to residential electric and magnetic fields and risk of childhood leukemia. Am J Epidemiol 1991;134:923–37.[Abstract]
  19. Brogan DJ, Denniston MM, Liff JM, et al. Comparison of telephone sampling and area sampling: response rates and within-household coverage. Am J Epidemiol 2001;153:1119–27.[Abstract/Free Full Text]
  20. Flanders WD, Austin H. Possibility of selection bias in matched case-control studies using friend controls. Am J Epidemiol 1986;124:150–3.[ISI][Medline]
  21. Robins J, Pike M. The validity of case-control studies with nonrandom selection of controls. Epidemiology 1990;1:273–84.[Medline]
  22. Rosenbaum PF, Buck GM, Brecher ML. Early child-care and preschool experiences and the risk of childhood acute lymphoblastic leukemia. Am J Epidemiol 2000;152:1136–44.[Abstract/Free Full Text]
  23. Li DK, Petitti DB, Willinger M, et al. Infant sleeping position and the risk of sudden infant death syndrome in California, 1997–2000. Am J Epidemiol 2003;157:446–55.[Abstract/Free Full Text]

Related articles in Am. J. Epidemiol.:

Invited Commentary: Birth Certificates—A Best Control Scenario?
Julie A. Ross, Logan G. Spector, Andrew F. Olshan, and Greta R. Bunin
Am. J. Epidemiol. 2004 159: 922-924. [Extract] [FREE Full Text]  

Ma et al. Respond to "Birth Certificates—A Best Control Scenario?"
Xiaomei Ma, Patricia A. Buffler, Michael Layefsky, Monique B. Does, and Peggy Reynolds
Am. J. Epidemiol. 2004 159: 925. [Extract] [FREE Full Text]