1 Epidemiology Service, Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY.
2 Department of Obstetrics and Gynecology, New York University Medical Center, New York, NY.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
case-control studies; databases; epidemiologic methods; socioeconomic factors
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Companies such as Claritas Inc. (San Diego, California) have developed systems that classify neighborhoods into "lifestyle" clusters. Marketers and charities use these systems to describe their customers or donors. The PRIZM system developed by Claritas Inc. categorizes every zip+4 postal code microneighborhood in the United States into one of 62 clusters defined by such factors as residents' income and education, age of head of household, household size, length of residence, race, foreign birth, population and housing density, home ownership/rental, and home value. These clusters are based on the US census and are augmented by information from surveys and data on consumer purchases, for example, from the use of credit cards. The 62 PRIZM clusters can be collapsed into 15 larger groups defined by socioeconomic status and type of area (urban, suburban, second city (smaller cities or satellite cities of major urban areas), small town, or rural). As examples of the type of information available on the clusters, residents of Winner's Circle areas, within the Elite Suburbs socioeconomic group, are described as executive suburban families with a head of household aged 3564 years and a median income of $90,700 that have a passport and read epicurean magazines. Residents of Old Yankee Rows, within the Urban Midscale group, are described as empty-nest, middle-class families with a head of household aged 2534 or >65 years and a median income of $34,600 that belong to a union and buy pop music. The clusters and broader socioeconomic groups are listed and described in table 1. More information can be found at Internet site www.claritasexpress.com.
|
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Controls recruited from the commercial database
We obtained patients' zip+4 codes and sent them to Experian (Allen, Texas), a company that owns a commercial database that classifies households by PRIZM codes. Of the 301 zip+4 codes sent to Experian, 286 were included in Experian's database and were assigned lifestyle clusters. Our patients were found in 40 of the 62 clusters, with the highest concentrations in the following clusters: Winner's Circle, 10.1 percent; Money and Brains, 9.4 percent; Old Yankee Rows, 8.7 percent; and Urban Gold Coast, 8.0 percent. About three-quarters of the cases were included in 13 of the clusters. By using the distribution of our cases across the clusters, Experian sampled from the database and provided us with a list of 1,503 women with the same distribution across lifestyle clusters as our cases and living in the same counties. All households in this database have listed telephone numbers. Between July 1997 and January 1998, we randomly selected about 20 names per week from this sampling frame, for a total of 421 names. We sent a letter to each woman selected, explaining the purpose of the study and that we would follow up with a telephone call. The letters were written on hospital letterhead and were signed by the study's principal investigator; the envelopes were hand addressed and stamped. These procedures are recommended for increasing response rates to mail surveys (3).
Controls recruited from random digit dialing
Roper Starch Worldwide Inc. (Princeton, New Jersey), a company that specializes in telephone survey research, conducted the random digit dialing. We provided Roper Starch Worldwide Inc. with the telephone numbers (minus the last three digits) of the 301 cases and age quotas by 5-year age groups based on the age distribution of the cases. Modified random digit dialing was used. Roper Starch Worldwide Inc. generated a list of numbers that began with the same first seven digits as the cases' 10-digit telephone numbers. After the sample was drawn, the company used a computer program to cross-check the numbers selected against listings in the yellow pages of the telephone book to eliminate business numbers and then automatically dialed the remaining telephone numbers, eliminating those that triggered a message that the number was nonworking. This procedure was repeated every 46 weeks during the study. Between February and September 1997, interviewers at Roper Starch Worldwide Inc. called randomly selected numbers up to 16 times, using a computer-generated algorithm to distribute callbacks over different days and times. Interviewers administered a very brief questionnaire, ascertaining whether there was a woman in the household who was eligible in terms of age and, if so, obtaining her name and address as well as the best time to call.
We received the names of 298 age-eligible women from Roper Starch Worldwide Inc. over an 8-month period. Because of time constraints, we telephoned only 231 of these women. The 67 women who were not telephoned by our interviewers were mainly those who had been contacted initially by Roper Starch Worldwide Inc. during the last 2 months of the project. Roper Starch Worldwide Inc. called 1,637 telephone numbers, of which 200 (12 percent) were of unknown usability (i.e., not answered after 16 tries) and 90 (5 percent) were known to be for households but eligibility could not be determined in the time frame of the study. The response rate for this phase of the study was 72.2 percent, calculated as the number of calls completed (those eligible plus those ineligible) divided by the number completed plus the number of women who refused.
Interviewer contacts
Interviewers employed and supervised by the Epidemiology Service at Memorial Sloan-Kettering Cancer Center made telephone calls to potential controls from both sources. The interviewers obtained preliminary verbal consent, mailed the consent form, and, after the signed form was returned, called again to schedule the interview. The interview, conducted by telephone, took on average 68 minutes to complete. The consent form included consent to give both blood for genetic testing and saliva; however, we included participants whether or not they agreed to give biologic specimens. We paid respondents in the control groups $50 for their participation. All procedures and instruments were approved by the Institutional Review Board.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The response rates for the two types of controls after they received the consent form are shown in table 3; rates were the same for the two groups, about 45 percent. The overall response rate for the controls recruited from the commercial database was 91/350, or 26.0 percent. When the response rate from the initial telephone calls to locate eligible respondents was taken into account, the overall response rate for controls located by using random digit dialing was 28.1 percent ((90/231) x 72.2 percent). The proportion of controls who gave blood was similar in the two groups, 76.8 percent for the commercial database and 72.2 percent for random digit dialing (data not shown in tables).
|
Characteristics of participants
Demographic characteristics of the controls identified from the commercial database and from random digit dialing who completed the study (excluding nine women from the commercial database who signed the consent form but could not be interviewed before the recruitment period ended) are shown in table 4. Women in the two control groups were similar in terms of age, race, and religion. Those identified by using the commercial database were of somewhat higher socioeconomic status as indicated by measures of education and income, although these differences were not statistically significant. There was a large and significant difference in area of residence; more commercial database controls lived in New Jersey and Connecticut and more random digit dialing controls in New York City and New York State.
|
Because the sampling frame for the commercial database included only women who had listed telephone numbers, we were interested in determining how many of the women located by random digit dialing had unlisted numbers. We determined whether numbers were listed by looking them up in telephone books or on the Internet or by calling directory assistance. We used the same procedures to locate the telephone numbers of controls selected by using the commercial database to account for women who, after the database was assembled, might have changed their names or addresses or requested that their numbers be unlisted. We did not attempt to look up telephone numbers for the cases, since they had been identified earlier (October 1993 to December 1996) and we did not have access to all telephone books from those years. As expected, more controls identified from random digit dialing had unlisted telephone numbers, 27 versus 5 percent.
Demographic characteristics of cases are also shown in table 4. By design, we intended the characteristics of cases and controls to be similar, with the exception of religion. We found cases to be similar to both control groups in terms of age and race. Their education and income levels were similar to those of the controls recruited from the commercial database and higher than those identified by using random digit dialing. Cases were more likely to refuse to answer the question on income or to say they didn't know. Their geographic distribution was similar to that of the random digit dialing control group. Cases were more likely to be Jewish.
We compared the commercial database and random digit dialing control groups in terms of their use of oral contraceptives and parity, two factors related to risk of ovarian cancer (table 5). The two groups were very similar in terms of oral contraceptive use and the percentage of women who were nulliparous, but the commercial database controls were much more likely to have two or more children. As we expected, there were substantial differences between cases and each control group; cases were more likely to be nulliparous and less likely to have used oral contraceptives.
|
Participation of cases and commercial database controls by area and socioeconomic status
The availability of lifestyle cluster codes for the women in the commercial database sampling frame and for cases enabled us to compare results of recruitment attempts according to the groups based on area and socioeconomic status (table 6). Among the controls identified from the commercial database, about one-third of the women in the Elite Suburbs and the Landed Gentry groups who were approached signed the consent form; in contrast, only 15 percent of those in the Urban Midscale group did so. Among the cases we approached, the percentage of women who consented was lowest in the Second City group. The largest differences between cases and controls were in the Urban Midscale and all other socioeconomic groups, those representing less-affluent areas, in which the proportion of women who consented was much higher among cases than among controls.
|
Completeness of the sampling frame for the commercial database
An additional analysis was undertaken to evaluate the completeness of the commercial database as a source of controls. We did so by determining how many of the 301 patients on our original list were included in the commercial database. The list owner provided us with a file of all of the 1,216 women listed who lived in the zip+4 areas in which our patients lived. We found that 85 (28.2 percent) of the 301 cases appeared on this list. The 301 patients lived in 253 different zip+4 areas. In these areas, the number of households on Experian's list ranged from 1 to 21, with a mean of 4.8 and a median of 4. Since some of the women on the case list might not have been included in the database because they had moved or died, we also analyzed the proportion of households with women identified by random digit dialing who were included in the commercial database. Experian conducted this analysis by computer matching the telephone numbers from random digit dialing to their database. Of those households in which the random digit dialing study determined that there was an adult woman (n = 466), 36.3 percent of them were in the commercial database. Since about 27 percent of the study women from the random digit dialing control group had unlisted telephone numbers (table 4), a maximum of about 73 percent could have been included on the commercial database list.
Costs
Obtaining the Experian mailing list of 1,503 names with the same distribution across the lifestyle clusters as our cases cost $1,500. The cost to send 421 letters included about 10 person-days for drawing the weekly samples, producing the letters and envelopes, and postage: about $1,600. Having Roper Starch Worldwide Inc. provide the names of 298 eligible women cost $22,000. Since we did not use all names provided by Roper Starch Worldwide Inc. before the data collection phase ended, we prorated this amount to estimate the cost of the 231 names we actually used; the prorated cost was $17,050. The cost of obtaining names for potential respondents from the commercial database was therefore 18 percent of the cost of obtaining names by using random digit dialing.
We investigated whether it took longer for our interviewers to reach controls recruited from the commercial database list than those recruited by random digit dialing, whom Roper Starch Worldwide Inc. had already contacted by telephone. The mean number of days on which potential respondents were called was similar for the two groups: 3.0 (standard deviation, 2.5) for the commercial database and 2.7 (standard deviation, 2.2) for random digit dialing. Any additional cost involved in reaching controls identified from the commercial database compared with random digit dialing appears to be minor.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A disadvantage of using the commercial database was that the sampling frame did not include most of the cases, indicating that the list contains a relatively small proportion of the source population. While part of the reason that we could not locate our cases on this list may have been that they had died, had moved, or had unlisted telephone numbers, results for the households identified by random digit dialing confirmed the incompleteness of the commercial database sampling frame. This problem might be overcome in future studies if other databases, or perhaps a combination of databases, were used. In addition, study investigators who use a commercial database might check before purchasing one to determine what proportion of their cases is included on the list being considered. In contrast, the sampling frame for random digit dialing by definition includes all cases, since it is based on their telephone numbers.
Another potential disadvantage of the commercial database was that a large proportion of respondents were residents of New Jersey or Connecticut compared with New York. While they were similar to cases in terms of other demographic measures, those respondents who lived further away from New York City might have been less likely to use our medical center had they been diagnosed with ovarian cancer. In addition, the geographic distribution of these controls appeared to confound the relation between parity and case-control status. We could have averted this problem if we had frequency-matched the commercial database list by county as well as by lifestyle cluster. While a fairly large number (16.9 percent) of women on the list were ineligible to participate, indicating that the list was out of date, this problem was minor because the amount of time and the cost of determining that these women were ineligible were negligible.
Overall response rates were low for both control groups. The procedures for this particular study are likely to have discouraged participation: we required signed informed consent forms before the interview, conducted a long interview, and requested blood and saliva samples. A large proportion of women who considered participating refused after reading the consent form, which was long and drew attention to the potential risks of genetic testing. There is general agreement in the epidemiologic community that response rates have declined (4). Social and economic factors such as families in which all adults work, the prevalence of telemarketing, and the use of answering machines and caller identification to screen telephone calls are likely to affect response. These problems may be particularly severe in urban areas in which tertiary centers are located. Although the New York area is an attractive setting for epidemiologic research because of the concentration and diversity of the population, these social and economic factors may be particularly important here.
Low response rates raise the potential for biased results if those persons who respond are different from those who do not. In most situations, including those in which random digit dialing is used, little information is available on the characteristics of persons who do not respond, so it is difficult to evaluate the potential for bias. Use of a commercial database with PRIZM codes assigned to cases and controls enabled us to compare responses according to broad socioeconomic groups. We found that the greatest discrepancy between cases and controls recruited from the commercial database was in the participation rates of women in somewhat lower socioeconomic groups. This finding is consistent with other studies that have evaluated characteristics of persons who respond and those who do not (5, 6
). This information on cases and controls would enable investigators to adjust results for nonresponse, which is not possible in most epidemiologic studies.
We know of only one other report of a novel strategy for recruiting controls for case-control studies when cases come from a tertiary center. Hudmon et al. (7) recruited controls who were smokers from a large, multispecialty health maintenance organization, which did not strictly represent the source population for their cases. They gave screening questionnaires to patients at the health maintenance organization and included a question on their willingness to take part in a study. Although these authors were unable to assess the proportion of patients who completed the screening questionnaires, they reported that about three-quarters of those who did answered yes or maybe to the question about participating in a future study and that they were able to recruit 87 percent of those patients for a study.
Because of the degree to which the demographic characteristics of the respondents recruited from the commercial database were similar to those of the cases and the lower cost of obtaining these controls, we conclude that commercial databases can provide an alternative to random digit dialing. However, in future studies in which this source of controls is used, an attempt should be made to find a more complete database or a combination of databases that more closely resembles the source population for the cases. The problem of lower response rates needs to be addressed by the epidemiologic community.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank the interviewersChristine Nakraseive, Monica Melo, and Lauren McGuinn.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|