1 Department of Biobehavioral Health, College of Health and Human Development, The Pennsylvania State University, University Park, PA.
2 Center for Developmental and Health Genetics, College of Health and Human Development, The Pennsylvania State University, University Park, PA.
3 Center for Opinion Research, Millersville University, Millersville, PA.
Received for publication September 24, 2001; accepted for publication February 24, 2002.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
data collection; ethics; genetics; informed consent; smoking; telephone
Abbreviations: Abbreviations: CI, confidence interval; RDD, random digit dialing; SD, standard deviation.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Cigarette smoking is a complex biobehavioral activity (2). Many studies have found substantial genetic influences on smoking, with heritability estimates ranging from 30 percent to 70 percent (314). Of course, these findings also indicate that 3070 percent of the variability in smoking is attributable to nongenetic, environmental effects. Estimates of genetic effects depend on the particular environmental contexts of these studies. With the development of highly polymorphic genetic markers that are widely scattered throughout the human genome, allele-sharing and association study methods have been increasingly used in studies designed to identify new loci that influence complex traits.
Representative samples
While it is sometimes useful to have nonrepresentative samples (e.g., persons with early onset of disease or severe phenotypes), sampling issues can be important in genetic studies of complex traits. With association studies, strategies for obtaining random samples of individuals are needed in order to avoid false associations arising from samples that introduce potential confounding through stratification. Survey research methods can be linked with procedures that allow DNA samples to be collected with minimal intrusion using a buccal swab or rinse and returned by mail (1517). For studies designed explicitly and solely to detect effects of loci, certain nonrandom sampling strategies (e.g., selection for extreme phenotypes) can be efficient. Random sampling is desirable in other contexts, such as evaluation of the broader, multivariate constellation of genetic and environmental effects, their covariation, and their interactions, or a broader study of multiple phenotypes in which selection on the basis of a single trait is not appropriate.
There are various sampling procedures that can create more representative samples (18). RDD computer-assisted telephone interview surveys are widely used in health research. Despite well-understood deficiencies, RDD computer-assisted telephone interviews are accepted for their ability to gather data substantially similar to those obtained from personal interviews on reasonably representative samples of the US population (19, 20). In case-control epidemiologic research, investigators often use RDD techniques to gather data from a control group that has been selected from a more representative population (2123).
Informed consent and terms of participation
The creation of a registry, especially a sibling registry, can be an important, cost-effective tool for genetic research. Issues of privacy and access to genetic information are particularly sensitive and have been a central focus of ethical debates on participation of human subjects in genetic research (2430).
Would possible participants be reluctant to consent to becoming part of such a registry? Would the individuals who joined such a registry be different from individuals who agreed to participate in a so-called "made-anonymous" study? In a made-anonymous study, once the data have been collected, individual identifiers are discarded. Middleton et al. (31) gave participants options on the consent form for use of their DNA sample in future research studies: 1) no future use (6 percent selected this option); 2) use if the sample were made anonymous (26 percent); 3) use if the participant were recontacted beforehand (73 percent); and 4) no restrictions on use of identifiable samples (6 percent) (31). It is unclear how participants would respond when presented with only one option for participation.
In the Smokers and Nonsmokers Study, we used an experimental design to evaluate the effects on participant characteristics of offering half of the persons receiving mailings the opportunity to participate in a "made-anonymous" study and the other half the opportunity to take part in a "registry" study in which we would maintain their names and addresses for further contact. We also assessed the feasibility of obtaining samples from siblings. Overall, we evaluated potential biases by exploring which factors (e.g., sex, age, smoking status, education, geographic region, and type of invitation (registry vs. made-anonymous)) predicted response in our study.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The mail survey
A packet containing detailed informed consent documents, buccal-sample collection kits, $5.00 cash, and a pen was sent by US Postal Service Priority Mail, along with a stamped Priority Mail envelope for return. Tokens such as the pen and small payment have been shown to improve cooperation rates (32). Priority Mail was chosen to expedite delivery and to help distinguish the study mailing from "junk" mail. The mailing included a personal letter of invitation signed by the investigators and a form requestingat the individuals optionthe names and telephone numbers of siblings (persons with the same biologic mother and father) who were at least 18 years of age and living in the continental United States. Scheduled follow-up telephone interviews were used to answer questions about the informed consent forms and the procedure for collecting buccal swabs.
DNA collection
The buccal-swab kit consisted of two plastic-shaft cotton-tipped swabs with one end removed inside two plastic Ziploc bags (S. C. Johnson and Son, Inc., Racine, Wisconsin). Written instructions asked participants to use the two cotton swabs provided in the kit. Participants were instructed to rub on the inside of their cheeks with each swab for 1 minute, using a firm but gentle motion, and then place the swabs inside the smaller plastic bag. This bag was then to be placed in the larger plastic bag to minimize risk of leakage. Subjects were instructed to return the buccal swabs via Priority Mail using the prepaid envelope provided with the kit. Studies that have tested the stability of buccal-cell DNA over extended periods of time (16, 17) and that have used mail as a transportation method (15) have shown this method to be reliable for collecting DNA that can be genotyped by polymerase chain reaction. Proposals for various modifications of these techniques have been published, but they all yield small and widely varying amounts of DNA, in the range of 150 µg (15). The samples were genotyped for a dopamine transporter (SLC6A3) marker (1, 33).
Participants were sent a reminder postcard 810 days after their interviews and were sent replacement kits when necessary. A toll-free telephone number was provided for questions. The institutional review board of the Pennsylvania State University approved the protocol. A Certificate of Confidentiality was obtained from the US Department of Health and Human Services.
"Made-anonymous" group versus "registry" group
Participants were randomly assigned to either a "made-anonymous" condition or a "registry" condition by computer prior to the initial contact. Telephone interviewers were unaware of the random manipulation. For participants assigned to the made-anonymous group, the informed consent form stated that once technicians had extracted enough DNA from the buccal swab for analysis, their names and code numbers would be stripped from the database so that there would be no way of tracing a given sample to a particular participant. If participants provided the names of siblings who later participated in the study, the removal of personal identifiers was delayed until the DNA samples of the siblings had been tested, but then all names and code numbers were removed from the database. Samples and data were given new unique numbers that linked siblings yet maintained confidentiality. Made-anonymous participants were asked whether their samples could be used in future genetic studies.
Participants assigned to the registry group were informed in the consent document that their names would be linked to their DNA sample and that they might be contacted for permission to use the sample in future genetic analyses. However, they were told that participation was confidential and that no information would be released without their written consent. The consent form also noted that a Certificate of Confidentiality had been issued by the US Department of Health and Human Services because of the potential sensitivity of smoking- and health-related information combined with genetic data. Participants were further informed that, after 5 years, all personal identifiers would be removed from the data set.
Nomination of siblings
An interviewee (proband) could nominate his or her siblings by providing their names and telephone numbers. Siblings were contacted by interviewers; if they agreed to participate, they completed the same procedure as outlined above. Siblings were assigned to the same group (registry or made-anonymous) as the proband.
Statistical analyses
The StatView statistical package (SAS Institute, Inc., Cary, North Carolina) was used for all analyses. Simple descriptive statistics and cross-tabulations were used to profile the sample with regard to race, age, gender, educational level, smoking status, and symptoms of depression. Logistic regression techniques were used for multivariate analyses.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Demographic data for different stages of participation
The demographic characteristics of the sample at all points of participation are shown in table 2. To assess how representative our sample was, we compared our results with the weighted results from the 1998 National Health Interview Survey (35). For all comparisons, our results were statistically different ( ps < 0.05) from the National Health Interview estimates, but the pattern of results showed broad similarity. The National Health Interview Survey uses a three-level categorization of smoking status: current smoker, former smoker, and nonsmoker (0100 cigarettes in a lifetime). Our sample contained fewer current smokers and more nonsmokers, but the percentages were generally similar and probably had practical utility.
|
|
|
Thirty-eight mailed kits (2 percent) came back marked "Return to Sender," indicating a problem with the address. The contents of four returned kits were unusable because of spoilage or contamination. Another four returned kits did not contain buccal swabs. Nine kits were returned without compliant consent forms (the forms were missing, not signed, or not dated). The mean yield of DNA from each pair of usable buccal swabs was 16.4 µg (SD 15.0), with a range of 0174 µg. Females (15.0 µg (SD 13.5)) yielded less DNA than did males (18.3 µg (SD 16.9)) (log + 1 transformation; F (df = 1, 863) = 7.03, p = 0.008). DNA yield was unrelated to turnaround time (r = 0.02, p = 0.566), and successful genotyping was also unrelated to turnaround time (F (df = 1, 846) = 0.635, p = 0.426).
Nominating siblings
Overall, 302 participants nominated at least one sibling (median, two siblings; range, 111 siblings). This response represents 8.9 percent of the persons interviewed and 34.7 percent of those who returned a buccal-swab kit. Multiple logistic regression analysis, restricted to participants with siblings who returned kits, was used to assess predictors of nominating siblings (figure 3). Having a college degree, formerly smoking, being in the registry group, and having siblings who smoked were significant predictors.
|
Effects of registry versus made-anonymous grouping
Participants assigned to the registry condition were more likely to nominate siblings than participants assigned to the made-anonymous condition (54.6 percent vs. 45.4 percent) (2 (1 df) = 11.37, p = 0.0007). In multiple logistic regression analysis controlling for age, sex, education, and smoking status, registry participants were more likely to provide names of siblings (odds ratio = 1.58, 95 percent CI: 1.16, 2.14; p = 0.004) (figure 3). Numbers of siblings nominated did not differ between groups (t (297 df) = 0.99, p = 0.324). A significant cross-product interaction was found (odds ratio = 1.97, 95 percent CI: 1.21, 3.20; p = 0.006) indicating that current smokers were less likely than members of the other smoking categories (former smokers, nonsmokers, and never smokers) to return kits if they were assigned to the registry condition (26.9 percent of the registry current smokers returned kits vs. 39.6 percent of the made-anonymous current smokers). (This interaction is in addition to the main effect of current smokers being less likely to return kits in both the registry group and the made-anonymous group.)
Future use of DNA
Made-anonymous participants were given the option to decline the use of their genetic material in future studies. When asked whether future testing of their samples related to smoking could be conducted by the experimenters, only 13 percent said "no." Those who said "no" did not differ from those who said "yes" in terms of race, gender, age, educational level, or smoking status. However, after data were controlled for these variables, those who said "yes" to future testing were three times more likely to nominate a sibling for participation (odds ratio = 3.09, 95 percent CI: 1.47, 6.51; p = 0.005). Among the siblings, only four made-anonymous participants (6.5 percent) said "no" to future genetic testing of their samples.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
All sampling schemes have biases and limitations. The initial RDD stage of this technique allows collection of some data (age, sex, education, etc.) on the persons approached, and such information can sometimes be used to make subsequent statistical adjustments for nonresponse (36). The collection of initial survey data is a very useful element of this procedure. The more response rates decline, the more likely it is that sample bias will be a problem. We do not know what unmeasured variables, if any (e.g., personality), might be responsible for the reduced participation at each stage; it is possible that random influences caused much of the attrition.
Although there were measurable demographic biases in our sample as compared with the US population, our sample should have been much less biased than an advertising-based sample. It is well known that males are less likely to participate in surveys and that better-educated and older persons are more likely to participate (37). Our findings are consistent with this pattern. With this sampling technique, investigators have the ability to obtain national samples practically with a broad geographic distributionsomething that might be much more challenging for a national advertising campaign. In addition, this procedure could be used to screen for research participants with specified characteristics (e.g., older persons, females, heavy smokers) in studies of targeted subgroups. Our results indicate that it would be more cost-effective to sample women than men.
The response rate suggests that many people are willing to volunteer for a study involving contribution of DNA samples to researchers they have never met. Before we conducted this study, we had little basis for estimating how the general public would react to invitations to provide DNA samples. Some of us expected that public concerns about genetic research and the risks of disclosing ones DNA might make participation rates vanishingly small. That did not happen. We think a useful probability sample was obtained.
We have been unable to identify a comparable nongenetic study (i.e., a study with national RDD telephone contact, followed by a mail survey and another telephone survey, with postcard reminders, that required a behavioral response (swabbing)) with which to compare our response rates. Therefore, we do not know how much the genetic topic may have depressed the rate of return of kits. It is possible that RDD response rates will continue to decline (36), especially with the increased use of cell phones, answering machines, and caller identification. The main intent of this project was to evaluate feasibility, not to advocate the use of this method. The goal of future studies will probably be to determine whether these response rates are adequate.
The presence of different ethical interpretations and rules in different contexts may make for significant changes in other studies of this topic. Some ethical review boards might not allow any payment of participants (even a modest payment of $5.00). Our review board required that we not keep registry samples beyond 5 years without getting further written permission. Other boards may not make such a requirement. Some boards might not allow participants to nominate siblings without their prior consent.
We did not test alternative methods of DNA collection in this report. Other techniques, such as oral rinsing techniques (3841), may provide equal or greater DNA yields (2550 µg); this should be explored. Buccal swabs provided us with a convenient method of collecting DNA samples that could be transmitted by mail, and the swabs generated sufficient amounts of DNA for analysis of many genotypes. The amounts of DNA obtained were not large, and this might make whole-genome scans difficult. With refinements or new techniques, greater numbers of genetic tests might be possible with these small amounts of DNA, but samples such as this may serve better for the study of specific genes rather than general searches.
Persons randomly assigned to the made-anonymous condition (a fact of which they were unaware) were more likely to agree to receive a mailing. We have no explanation as to why random assignment failed to balance agreement to receive kits between the registry group and the made-anonymous group. The random assignment procedure did work to balance age, sex, education, and smoking status between the registry and made-anonymous groups. This would not have happened if there had been a broad failure of our random assignment procedure. Careful examination of the telephone interview script indicated that neither the interviewer nor the interviewee could have been aware of the subsequent group assignment that was revealed when the mailed kit was opened. We assume that this chance finding did not have systematic, biasing effects on the remainder of the study.
Why were current smokers in the registry group less likely to return DNA samples? We speculate that these smokers were trying to avoid possible future hassles related to their smoking status. The prospect was that we could recontact them for further research with the obvious question, "Are you still smoking?" In light of all the social pressure there has been in recent years to quit smoking, smokers might prefer to avoid being asked whether they are still smoking.
Why were persons in the registry group more likely than persons in the made-anonymous group to nominate siblings? We speculate that the proposed future contact for the registry group may have led registry respondents to feel more that the study would be connected with their families than made-anonymous respondents. Despite how it might be explained, this difference in nominating siblings is large enough (55 percent vs. 45 percent) that it would be of practical significance. The registry group did not differ from the made-anonymous group in terms of demographic characteristics, and the registry group produced a larger sibling sample. A registry sample would also be required for longitudinal work. It is valuable to know that the made-anonymous condition did not produce greater participation than did the registry condition.
One reason to undertake research on the characterization of individual differences, whether such differences are of genetic origin or are due to other individual variables, is to develop a better understanding of basic biobehavioral mechanisms. Another reason for such research may be its potential to aid in the development of intervention strategies that are tailored to be most effective for the individual (42). This approach may seem obvious for some public health intervention programsfor example, persons with hypertension respond differentially to different pharmacologic, diet, and lifestyle intervention strategies. It is equally important to consider individual differences in response to interventions involving behavioral risk factors that affect health, such as smoking behavior. An approach like the one described here, coupling extensive surveys with DNA sample collection, may aid in further exploration of genetic contributions to behavioral risk factors and in designing interventions to address these factors.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank Amy Walters, Kristy Minarsky, and Cathleen Salemme for their assistance with data collection and management; Christina Bennett, Rebecca Stauffer, and Michael Grant for their assistance with genotyping; and Christina Abbott for her assistance with the telephone interview data.
Portions of this report were presented at the 7th Annual Meeting of the Society for Research on Nicotine and Tobacco (Seattle, Washington, March 2325, 2001).
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|