1 Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT.
2 Division of Public Health Biology and Epidemiology, University of California, Berkeley, CA.
3 Public Health Institute, Oakland, CA.
4 Environmental Health Investigations Branch, California Department of Health Services, Oakland, CA.
Received for publication October 2, 2003; accepted for publication December 17, 2003.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
case-control studies; child; epidemiologic methods; leukemia
Abbreviations: Abbreviations: NCCLS, Northern California Childhood Leukemia Study; RDD, random digit dialing.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In case-control studies, selection of an appropriate control group is critical because study conclusions are based on a comparison of the exposure histories provided by cases and controls. Three principles of comparability between cases and controls have been suggested: 1) all comparisons should be made within the study base, 2) comparisons of the effects of exposure levels on disease risk should not be distorted by the effects of other factors, and 3) any errors in measuring exposure should be nondifferential between cases and controls (1). Various types of controls, such as population-based controls, hospital controls, neighborhood controls, and friend controls, have different strengths and limitations. However, few studies have generated empirical data to evaluate different strategies for control selection in adults (2, 3) or children (46).
During an early stage of the Northern California Childhood Leukemia Study (NCCLS), we randomly selected two controls for each case, one from computerized California birth records, the other from a roster of friends provided by families of cases. To evaluate the representativeness of controls resulting from these two different control selection strategies, we compared selected characteristics of these two control groups with those of a third control group comprised of "ideal" population-based controls who would have been obtained under optimal circumstances. The results of this evaluation are presented in this paper. Because the use of birth records for population-based control selection is gaining importance for studies of disease risks in US children, we also describe our experience with recruiting 560 birth certificate controls.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
During the first few years of phase 1, two controls were randomly selected for each case: one from the statewide electronic birth files maintained by the Center for Health Statistics of the California Department of Health Services (birth certificate control), the other from a roster of friends nominated by the family of the case (friend control). Both controls were individually matched to cases on age (date of birth for birth certificate controls and ±1 year for friend controls), gender, Hispanic status (a child is considered Hispanic if either parent is Hispanic), maternal county of residence at the time of the index childs birth, and maternal race (White, African American, or other). To be eligible, each case or control had to 1) reside in the study area, 2) be less than 15 years of age at the reference date (time of diagnosis for cases and the corresponding date for matched controls), 3) have at least one parent or guardian who speaks English or Spanish, and 4) have no previous history of any malignancy. Friend controls were also screened to make certain that they were not blood relatives of the case. If the family of the case could not nominate a list of friends of exactly the same age, gender, race, and Hispanic status, the matching criteria were relaxed.
Statewide birth records have been computerized in California since 1960. The record for each case born in California was identified and was used to select four potential controls. In phase 1, birth certificate controls were matched to cases born in the study area of 17 Bay Area counties on date of birth, gender, Hispanic status (either parent Hispanic), maternal race, and maternal county of residence, as listed on the birth certificate. The procedure was identical for cases born in California counties that are not part of the NCCLS study area (5 percent of all cases), except that county of residence at diagnosis was used for matching. For cases born outside of California (7 percent of cases), information on maternal race and Hispanic status was assessed by hospital staff rather than from the birth certificate, and the county of residence at diagnosis was used for matching.
The birth certificates for the four potential birth certificate controls and the case (if born in California) were obtained from the Center for Health Statistics. One of the four birth certificate controls was randomly selected as the first potential control to be recruited for the study. Using personal identifiers from the birth certificate, such as names and addresses, we located potential controls by using reverse directories and commercially available Internet-based search tools. Professional interviewers contacted each family using standardized searching protocols (e.g., calling a set number of times during prescribed hours). In some instances, relatives or neighbors were contacted in an effort to reach a family. If telephone contact was unsuccessful and there was a likely current address, contact was made by a letter and/or home visit. If the first-choice control could not be located, was ineligible, or declined to participate, the next randomly selected potential control was pursued. This procedure was repeated until an eligible and consenting birth certificate control was enrolled in the study. If no control was enrolled by using the first set of four birth certificates, additional certificates were requested from the Center for Health Statistics and the process described above was repeated.
The families of the cases were each asked to nominate three children who met the matching and eligibility criteria listed above and whose parents might be willing to participate in the study. The parents of one nominee were randomly selected to be contacted by telephone to introduce the study and further determine the eligibility of the potential control. If the potential friend control did not meet the criteria or refused to participate in the study, a second of the three friends was randomly selected and the protocol repeated.
Evaluation of the two different control selection strategies
The use of birth records to recruit population-based controls was not common in the 1990s. It was assumed, but unknown, that birth certificate controls enrolled in the study would be representative of the population base from which the cases arose. Furthermore, contrary to some assumptions (7), recruitment of friend controls in our study posed many unexpected logistical problems. Consequently, in 1999, we evaluated the methodological aspects of the two control selection strategies. By that time, both a birth certificate control and a friend control had been selected and interviewed for 64 cases. In addition, we used birth records to randomly select 192 "ideal" controls from among children who had been born in the 17-county study area, matched 3:1 to the 64 cases on date of birth, gender, Hispanic status, and maternal race. These controls were not matched on county of residence; beginning with phase 2 (late 1999), county of residence has not been used as a matching factor because of concerns about overmatching. These controls were considered ideal because they were exactly population based, did not need to be traced, and therefore were not subject to attrition because of the inability to locate them. Most matching and eligibility criteria could be directly assessed for these children (i.e., birth date, race, Hispanic status, birth residence), although some eligibility criteria could not (current residence, previous malignancy, spoken language of the parents). In addition, these controls did not need to be contacted, which precluded the possibility of refusals. For the purposes of this evaluation, all ideal controls were assumed eligible.
Data on parental ages, birth weight, total number of livebirths (including the newborn), total number of previous pregnancy losses (including spontaneous abortions and stillbirths), and time since last livebirth were obtained directly from the birth certificates for all 320 subjects (64 birth certificate controls, 64 friend controls, and 192 ideal controls). These variables were selected because they were recorded consistently on birth certificates (withstanding changes to the format of the California birth certificate over the last two decades) and some previous studies reported associations of these factors with the risk of childhood leukemia (811). For subjects born after 1988, information was also available for parental years of education. Enrolled birth certificate controls and friend controls were compared with the ideal controls. Of the eight variables reviewed, maternal age and paternal and maternal years of education appeared to have an approximately normal distribution. Students t test was used to compare those three variables, while the nonparametric Wilcoxon rank-sum test was used to compare the other five variables. The eight variables were also converted into various categories, a format that is more conventional in epidemiologic studies but often reduces statistical efficiency. The chi-square test was used to assess whether the distributions of these variables were the same between different control groups.
Recruitment of population-based controls using the computerized California birth files
As of April 2003, from a total of 1,489 potential controls who had been considered, 560 birth certificate controls were enrolled in the NCCLS. Details of the various search scenarios were recorded, and the participation rate was calculated and reported (figure 1). The number of potential controls considered for each eligible and consenting case was also documented. In addition, we recorded the number of searches conducted for birth certificate controls in various age groups (04, 59, and 1014 years) and different ethnic categories (Hispanic and non-Hispanic).
|
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Recruitment of birth certificate controls has continued since the study began in 1995. As of April 2003, the overall participation rate of birth certificate controls in the NCCLS was 49.0 percentthe number of controls who had been enrolled (n = 560) divided by the number of birth certificate controls sought, excluding the confirmed and presumed ineligibles (n = 1,142) (figure 1). The participation rate for controls in the NCCLS increased from 45.1 percent in phase 1 (19951999) to 51.6 percent in phases 2 and 3 (1999present) after the searching techniques were refined to develop and implement more culturally sensitive protocols. The number of searches conducted for each enrolled birth certificate control ranged from one to 16, with an average of 2.66 (2.78 per Hispanic control and 2.58 per non-Hispanic control). The average numbers of searches conducted for enrolled controls in the age groups 04 years, 59 years, and 1014 years were 2.38, 3.14, and 2.85, respectively.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Many case-control studies of childhood malignancies conducted in the United States have used random digit dialing (RDD) to recruit controls (1215). Although RDD has certain advantages (16), evidence is growing that using RDD for control selection may result in a control group biased with respect to socioeconomic status, residential stability, and number of siblings, population characteristics that may be related to the probability of a variety of exposures (12, 17, 18). Consequently, we decided not to use RDD when the NCCLS began in 1995. Recent experience has confirmed our decision (19).
During the first few years of the NCCLS, friend controls were chosen in preference to RDD controls, and we sought to evaluate the feasibility of using birth certificate and friend controls. At the outset of the study in 1995, it was unclear whether it would be possible to use birth records to enroll population-based controls, and the NCCLS was one of the first studies of childhood malignancy in the United States to use such a method. Paradoxically, at that time, it was believed that friend controls would be relatively convenient and less expensive to recruit (7). This evaluation of control selection strategies using population-based "ideal" controls indicates that, in addition to logistic difficulties (especially with our Hispanic study population), friend controls are less useful, a finding consistent with earlier assessments (20, 21). The total number of previous pregnancy losses and time since last livebirth were significantly different between the friend and ideal controls but not between the population-based birth certificate and ideal controls. Had friend controls been used as the comparison group in future case-control analyses, spurious associations between these two factors and disease risk might have been generated.
Selection of population controls from a primary base ensures that the controls are drawn from the same population as the case series (7), a major methodological advantage. The practical challenge, however, is that there is usually no readily available population roster in the United States, as opposed to many European countries. In the NCCLS, we consider birth records preferred sources of controls since we conceptualize a population base consisting of children who were born in the 17-county study area and were residents of the study area at the reference date. However, a relatively small percentage (12 percent) of cases was not born in the study area (approximately 5 percent were born in other counties in California and 7 percent outside of California). To make the population base for cases and controls comparable, these cases would need to be excluded. In all NCCLS data analyses conducted to date, we have consistently compared the results for all cases with the results for cases born in the study area. Although no significant differences have been identified, we will continue to analyze the data in this manner. If any discrepancies are noted in future analyses, we plan to report results derived from analysis of cases born in the study area as well as all cases.
The "ideal" controls included in the evaluation are less than ideal in some aspects; some may not be eligible for the study because they have moved out of the study area, do not have an English- or Spanish-speaking parent, or had a malignancy (which would be extremely rare). These aspects would be problematic if there are significant differences regarding the exposure to leukemia risk factors between potential controls who moved out of the study area and those who did not, or between potential controls who have English- or Spanish-speaking parents and those who do not. These aspects represent an inherent limitation of this approach to control selection. On the other hand, the fact that these potential controls did not need to be traced and there was no possibility of refusals still provides a useful basis for this methodological evaluation.
Participation rate is sometimes considered an indicator of the potential for selection bias. The participation rate for birth certificate controls in the NCCLS is comparable to the 48.8 percent rate reported in a recently published study of childhood leukemia in New York State that used birth certificate controls (22). Furthermore, the participation rate for controls in the NCCLS is higher than the rate reported in a recent study with a similar design, which was conducted in California with cases of sudden infant death syndrome reported during 19972000 (41.3 percent) (23). Although the participation rates for birth certificate controls appear low, it is important to note that participation rate is not the most relevant consideration for inference purposes. The representativeness of the participating controls or comparability to the underlying study population is far more important. Lower participation rates do not necessarily limit inferences from a study as long as the participating controls reflect the exposure distribution of the source population of the cases. Moreover, data available for calculating participation rates is not uniform across different studies, and, because of considerable missing data, it is sometimes impossible to calculate informative participation rates. For example, the true denominator necessary to calculate participation rates for RDD controls usually cannot be determined because there is often no information about the people who screen calls, do not answer, hang up immediately, or refuse to talk to the caller.
The average number of searches that needed to be conducted to enroll a birth certificate control was somewhat higher for Hispanic children than for non-Hispanic children. Age also appeared to affect the number of searches. Fewer searches were needed for each younger birth certificate control (aged 04 years) than for an older one (aged 514 years).
Using birth certificates to select controls can be problematic if exposure or outcome variables are associated with the probability of a potential control being located. It will be important to compare the birth characteristics of potential controls who are successfully traced with the birth characteristics of those who are not located to determine whether they are systematically different. We are pursuing these additional analyses.
In summary, our experience in California indicates that friend controls may not be representative of the study population and that there may be systematic differences by ethnic group in analyses in which friend controls are used. However, we did observe that it is feasible to use birth records to successfully recruit population-based controls, despite the efforts needed to trace potential controls and the inability to locate a significant proportion of them (19.5 percent). The use of birth records, if applied appropriately, can serve as an important alternative to available control selection strategies for case-control studies of childhood diseases.
![]() |
ACKNOWLEDGMENTS |
---|
The ideas and opinions expressed are those of the authors, and no endorsement by the California Department of Health Services should be inferred.
The authors thank clinical investigators at the collaborating hospitals for help in recruiting patients. They also thank the California Department of Health Services, Center for Health Statistics, and the California Cancer Registry for providing data used in these analyses.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Related articles in Am. J. Epidemiol.: