Use of World Wide Web-based Directories for Tracing Subjects in Epidemiologic Studies

Malcolm M. Koo1 and Thomas E. Rohan1,2

1 Department of Public Health Sciences, University of Toronto, Toronto, Ontario, Canada.
2 Present address: Department of Epidemiology and Social Medicine, Albert Einstein College of Medicine, 1300 Morris Park Avenue, New York, NY 10461.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The recent availability of World Wide Web-based directories has opened up a new approach for tracing subjects in epidemiologic studies. The completeness of two World Wide Web-based directories (Canada411 and InfoSpace Canada) for subject tracing was evaluated by using a randomized crossover design for 346 adults randomly selected from respondents in an ongoing cohort study. About half (56.4%) of the subjects were successfully located by using either Canada411 or InfoSpace. Of the 43.6% of the subjects who could not be located using either directory, the majority (73.5%) were female. Overall, there was no clear advantage of one directory over the other. Although Canada411 could find significantly more subjects than InfoSpace, the number of potential matches returned by Canada411 was also higher, which meant that a longer list of potential matches had to be examined before a true match could be found. One strategy to minimize the number of potential matches per true match is to first search by InfoSpace with the last name and first name, then by Canada411 with the last name and first name, and finally by InfoSpace with the last name and first initial. Internet-based searches represent a potentially useful approach to tracing subjects in epidemiologic studies. Am J Epidemiol 2000;152:889–94.

bias (epidemiology); contact tracing; epidemiologic methods; Internet; longitudinal studies

Abbreviations: WWW, World Wide Web.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Minimizing nonresponse bias is a major challenge in prospective epidemiologic studies. One possible cause of nonresponse in studies conducted by mail is loss to follow-up when study participants have changed their mailing addresses. Traditional approaches for tracing subjects lost to follow-up include checking telephone books, obtaining directory assistance, contacting participants' previous neighbors, contacting current residents of participants' former addresses, checking driver's license records, and using commercial tracing agencies (1GoGoGo–4Go).

The recent growth in popularity of the World Wide Web (WWW) has opened up new ways of exchanging information (5GoGoGo–8Go) and conducting research (9GoGoGo–12Go). Of the many tools available on the WWW, directories for finding telephone numbers and addresses with last name and first name information may have particular relevance for subject tracing in epidemiologic research. However, the completeness of these has not been evaluated. The objective of the study described here was to evaluate the potential utility of two World Wide Web-based directories with Canadian content (Canada411 and InfoSpace Canada) for tracing subjects in a follow-up study.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The study was conducted as a part of the first follow-up of a cohort of adults in whom dietary and lifestyle factors are being investigated. From the respondents to the mailed questionnaires, 346 subjects who resided in Ontario were randomly selected for this study. Of these, 183 (52.9 percent) were male, and 257 (74.3 percent) resided in metropolitan Toronto. A majority (85.8 percent) of the subjects resided in single-unit dwellings (i.e., not multiunit dwellings such as apartment buildings or townhouses, where a single street address can have more than one residential unit). Only those subjects who responded to our mailed questionnaire were selected for this study because their reply provided their current address, which served as the "gold standard" against which addresses obtained from the WWW-based directories could be compared.

A randomized crossover study design was used (figure 1). All subjects were searched for by using two WWW-based directories, Canada411 and InfoSpace Canada. Descriptions of these nationwide directories and several province-specific directories are presented in table 1. According to the information available on their respective websites, Canada411 obtains their data from telephone book listings (residential white pages), while InfoSpace acquires their data from Acxiom (Conway, Arkansas) (http://www.databyacxiom.com), which aggregates data from telephone books and other public records. Printed coding forms containing the names, addresses, and telephone numbers of the subjects were given to two clerks. Half of the subjects were randomized to a search by Canada411 first and then by InfoSpace, and the other half were randomized to a search in the reverse order. This strategy was used to minimize the potential bias introduced by a training effect on the clerks. Use of the WWW-based directories entails entering a subject's information (last name, first name or first initial, and province) into the appropriate search field on the Web page of the directories. Once a search request is submitted, a new Web page with the search result will return, usually within a few seconds to a minute, depending on the traffic on the Internet and the load on the directories' database servers. All subjects were searched for according to the following sequence. First, the last name, first name, and province (Ontario) of the subjects were used as the search criteria. If a subject could not be located, then the last name, first initial, and province (Ontario) of the subjects were used. Information on whether a match was found and the total number of potential matches per search was recorded. The latter variable was recorded for calculation of the number of potential unsuccessful verifications that would be needed before the true match could be found.



View larger version (27K):
[in this window]
[in a new window]
 
FIGURE 1. Study design and search procedures for tracing subjects by using two World Wide Web-based directories, Ontario, Canada, 1999.

 

View this table:
[in this window]
[in a new window]
 
TABLE 1. World Wide Web-based directories with Canadian content

 
The sample size for this study was estimated from a pilot study using 21 names randomly selected from entries in the 1997–1998 Bell Toronto telephone book. The results indicated that 80 subjects per group would be required to detect a difference of 14 percent (81 percent in Canada411 and 95 percent in InfoSpace) in finding possible matches between the two methods by using a two-sided test at a significance level of 5 percent and statistical power of 80 percent. To maintain the same statistical power in this study, in which comparisons were made for males and females separately, the total sample size was increased to 320. The p values for the difference between two proportions were calculated according to Fleiss (13Go), using EpiCalc 2000 Version 1.0 (14Go). Wilcoxon rank sum tests were used to test differences between the number of potential matches returned by the two directories and were performed by using the Statistical Analysis System (SAS) (15Go). The p values were based on two-tailed tests.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
In our sample of 346 Ontario adults, 170 subjects (49.1 percent) were found by either the Canada411 or the InfoSpace directory by using both last name and first name as well as last name and first initial in the search (table 2). Of those located by both directories, Canada411 located 58.2 percent ((53 + 46)/170) of the subjects using last name and first name in the search, whereas InfoSpace located 35.3 percent ((53 + 7)/170) of them using the same information in the search. A total of 151 subjects (43.6 percent) were not found by either of the two directories, and of these subjects, 111 (73.5 percent) were female. In all, InfoSpace was able to locate 54.0 percent ((170 + 17)/346) of the subjects, and Canada411 was able to locate 51.4 percent ((170 + 8)/346). In other words, InfoSpace located 17 subjects who could not be located by Canada411, while Canada411 located eight subjects who could not be located by InfoSpace. Of the 17 subjects found only by InfoSpace, the majority (82.4 percent) were found by using a search with the last name and first initial. On the other hand, only two of the eight matches obtained with Canada411 were based on searches using the last name and first initial (table 2).


View this table:
[in this window]
[in a new window]
 
TABLE 2. Number and percent of matches found by the two World Wide Web-based directories, Ontario, Canada, 1999

 
Table 3 summarizes the search results with the two directories. Overall, the two directories were able to find similar numbers of subjects. When subjects were grouped by the whether the first name or the first initial was used in the search field, differences between the two directories emerged. The Canada411 directory located significantly (p = 0.0002) more subjects than the InfoSpace directory when the last name and first name were used in the search, whereas the InfoSpace directory located significantly (p < 0.0001) more subjects than the Canada411 directory when the last name and first initial were used. Similar patterns were seen when males and females were examined separately and when subjects residing in Toronto and outside of Toronto were examined separately (data not shown).


View this table:
[in this window]
[in a new window]
 
TABLE 3. Number of matches found by the two World Wide Web-based directories, Ontario, Canada 1999

 
Table 4 shows the median number of potential matches found in situations in which a match was ultimately found and in those situations in which no match was found. The number of potential matches returned by both directories was similar, except that InfoSpace returned significantly (p = 0.02) fewer matches in cases in which exact matches were not found when the last name and first name were used. Although the upper limit of the number of potential matches that can be displayed for each search request is as high as 500 for Canada411 and only as high as 250 for InfoSpace, this directory property did not bias the median of the number of potential matches in Canada411 to a significantly higher value because only four searches resulted in a list of potential matches of 500.


View this table:
[in this window]
[in a new window]
 
TABLE 4. Median and range of number of potential matches per search returned by the two World Wide Web-based directories, Ontario, Canada, 1999

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
In this crossover study, we evaluated the completeness of two WWW-based directories for tracing subjects in a follow-up study. The study subjects were randomly selected from the respondents in an ongoing follow-up study. Since the addresses of these subjects were known to be correct and current at the time of the search, they served as the gold standard against which the search results from the WWW-based directories were compared.

Overall, 43.6 percent of the subjects could not be found by either of the directories. These subjects may not have appeared in the WWW-based directories because their telephone numbers were listed under another name, because they had unpublished telephone numbers, or because they had new telephone numbers (depending on the directory, a new telephone number could take up to several months before it is available from that source (table 1)). As expected, the majority of these untraceable subjects were female, which may reflect the fact that some women do not have separate listings under their own names. Indeed, when reverse look up (which is a means of obtaining the name under which the telephone number is listed and the street address associated with a given telephone number or the name and telephone number associated with a given street address, and which is available only in InfoSpace) with telephone number was used, we found that 82.0 percent (91/111) of the women who were not found by either directory were not listed under their own names. Of those 91 women, about 40.7 percent (37/91) were listed with male first names, and 4.4 percent (4/91) were listed with the last name and second initial (but not the first) of the women. In addition, 10.8 percent (12/111) of the telephone numbers of the women that were not found by either directory were unpublished numbers, and 7.2 percent (8/111) of the numbers had unmatched names and addresses, indicating that the individuals had moved. In males, 45.0 percent (18/40) of the telephone numbers were listed under a different name, 22.5 percent (9/40) were unpublished numbers, and 27.5 percent (11/40) had unmatched names and addresses.

There appeared to be no major advantage to using one directory over the other in tracing subjects in our study sample. About half of the subjects were found by either Canada411 or InfoSpace when both the first name and first initial were used in the search. Only 17 subjects would have been missed if only InfoSpace had been used alone, and eight subjects would have been missed if only Canada411 had been used alone (table 2). If only the last name and first name, but not the first initial, were used in the search, Canada411 returned significantly more correct matches than did InfoSpace and vice versa (table 3). This is the result of differences between the two directories in the way that they handle nonmatches with a first-name search. In Canada411, when an exact first name match is not found, all names with a matching initial will be displayed. On the other hand, no match will be displayed (i.e., subjects with matching first initial will not be displayed) if an exact first name match is not found by InfoSpace. Since subjects with a matching first initial are displayed when a first name search does not locate any match in Canada411, a subsequent search with the first initial usually does not reveal any additional potential matches.

Although Canada411 may appear to be more useful than InfoSpace because it yields a greater number of matches when the last name and first name are used in the search, this directory results in a longer list of potential matches per search. In other words, more verification by telephone or mail per true match is required when using Canada411. Although the median number of potential matches was similar for Canada411 and InfoSpace, the maximum number of potential matches was higher for Canada411 than for InfoSpace in five of the six situations summarized in table 4. In the situation in which no true match could be found with last name and first name search, the median number of potential matches found by using Canada411 was significantly (p = 0.02) greater than when using InfoSpace. In fact, if a list of less than 10 potential matches were to be adopted as the practical limit that is cost-effective for verification, then 25 of 68 (36.8 percent) searches would have to be discarded from the Canada411 search but only six of 34 (17.6 percent) would have to be discarded from the InfoSpace search. To minimize the number of potential matches per true match, our results would suggest that the following strategy be used for tracing subjects using the two WWW-based directories that we evaluated. First, use InfoSpace with the last name and first name in the search field. If an exact match is not found, then use Canada411 with the last name and first name in the search field. If none of the potential matches returned by Canada411 is verified to be correct, then the last step would be to go back to InfoSpace and use the last name and first initial in the search field. Searching with Canada411 using the last name and first initial can be omitted because such a search will probably return no additional potential match other than the ones returned in step 2. Similar algorithms could be developed for directories other than those evaluated in this study, depending on the results of formal evaluations such as those described here.

We attempted to compare the Web-based approach with the more traditional method of using directory assistance. However, this revealed a number of limitations in the use of telephone books and operator-based directory assistance in tracing subjects in comparison with the Web-based approach. First, since our study subjects resided in a large geographic area (the province of Ontario), up-to-date telephone books for each of the regions (a region is defined here as a geographic area covered by a single telephone book) would have had to be obtained. For those subjects who had moved out of a region, additional time would be involved in locating them, as all telephone books for Ontario (and beyond) would have to be searched. Second, the information on both of the Web-based directories is more up-to-date than that available from the telephone book, since the former sources are updated every 3 months. We tested the usefulness of operator-based directory assistance (Bell Canada), and our attempts yielded disappointing results for locating subjects with certain characteristics. Directory assistance is now fully computer automated with minimal operator involvement if information on the city, name, and street address is given to the computer. If any of this information is missing (this is the case in tracing, since the street name will not be available), the caller is transferred to an operator. If there is an exact match by last name, first name, and city, the operator will provide the telephone number of the match. If there is no exact match by last name and first name, the chance of obtaining a match is low. Operators will not perform a province-wide search if no city is provided by the caller. Therefore, subjects who have moved out of a city cannot be located by directory assistance. In addition, if there is no exact match by last name and first name, the operator will give one or two telephone numbers of individuals with matching last name and first initial. However, if there are many individuals with matching last name and first initial, the operator will not give the list unless the caller can provide a street name. Bell Canada directory assistance is useful for finding telephone numbers of individuals with a known street name and full names or when there is an exact match based on last names and first names but is not useful in other situations that are generally the norm when tracing subjects. On the other hand, the Web-based directories can provide a complete listing of all potential matching individuals if an exact match by last name and first name is not found. Although it may take some time to narrow down the search to the specific person of interest if the number of potential matches is large, the effort can be reduced if certain rules are used, such as trying those who reside in the same city or those with the same second initial, if available. Moreover, the list of potential matches can easily be printed from the computer, which can minimize error associated with transcribing the information from telephone books or from operator-based directory assistance. Similar limitations appear to apply to directory assistance in the United States. In a follow-up of a clinical trial designed to reduce initiation of tobacco use in 16,915 adolescent orthodontic patients in Southern California (16Go), directory assistance was used to trace subjects who could not be reached by using the telephone number that they had provided at the beginning of the study 2 years earlier. The success rate using directory assistance was 16 percent. This success rate was similar to the number (15.3 percent) observed in our study when the last name and first name were used as the search criteria. However, when we used last name and first initial in the Web-based directories, the success rate increased to 49.1 percent (table 2).

In conclusion, this study represents the first attempt to evaluate the completeness of WWW-based directories in subject tracing for follow-up studies. We consider it likely that our results from the Ontario population can be generalized to other provinces in Canada. Since InfoSpace is also available in the United States, our results from InfoSpace Canada may also be applicable in tracing those subjects. At this point, WWW-based directories might provide a cheaper, quicker, and more convenient way of finding telephone numbers and addresses of study subjects compared with directory assistance from a telephone company. Until results from studies formally comparing the WWW approach with traditional approaches to subject tracing are available, reliance should not be placed solely on the WWW approach. All available approaches should be used to minimize bias due to nonparticipation.


    NOTES
 
Correspondence to Dr. Thomas E. Rohan, Department of Epidemiology and Social Medicine, Albert Einstein College of Medicine, 1300 Morris Park Avenue, New York, NY 10461 (e-mail: rohan{at}aecom.yu.edu).


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. McAllister RJ, Goe SJ, Butler EW. Tracking respondents in longitudinal surveys: some preliminary considerations. Public Opinion Q 1973;37:413–16.[Abstract]
  2. Modan B. Some methodological aspects of a retrospective follow-up study. Am J Epidemiol 1966;82:297–304.[ISI]
  3. Donohue L. Tracing lost research participants. Aust J Adv Nursing 1995;12:6–10.
  4. Hunt JR, White E. Retaining and tracking cohort study members. Epidemiol Rev 1998;20:57–70.[ISI][Medline]
  5. Pallen M. Guide to the Internet. The World Wide Web. BMJ 1995;311:1552–6.[Free Full Text]
  6. Lawrence S, Giles CL. Searching the World Wide Web. Science 1998;280:98–100.[Abstract/Free Full Text]
  7. Graber MA, Bergus GR, York C. Using the World Wide Web to answer clinical questions: how efficient are different methods of information retrieval? J Fam Pract 1999;48:520–4.[ISI][Medline]
  8. Mann CE. Searching for HIV/AIDS information on the World Wide Web. J Assoc Nurses AIDS Care 1999;10:79–81.[Abstract/Free Full Text]
  9. Rothman KJ, Cann CI, Walker AM. Epidemiology and the Internet. Epidemiology 1997;8:123–5.[ISI][Medline]
  10. Bell DS, Kahn CE Jr. Health status assessment via the World Wide Web. Proc AMIA Annu Fall Symp 1996:338–42.
  11. Buchanan T, Smith JL. Using the Internet for psychological research: personality testing on the World Wide Web. Br J Psychol 1999;90 (Part 1):125–44.
  12. Eysenbach G, Diepgen TL. Epidemiological data can be gathered with World Wide Web. (Letter). BMJ 1998;316:72.[Free Full Text]
  13. Fleiss JL, Statistical methods for rates and proportions. 2nd ed. New York, NY: John Wiley & Sons, 1981:23–4.
  14. Gilman J, Myatt M. EpiCalc 2000. Version 1.00. London, England: Brixton Books, 1997.
  15. SAS Institute Inc. SAS/STAT user's guide. Version 6. Cary, NC: SAS Institute, Inc., 1990.
  16. Morrison TC, Wahlgren DR, Hovell MF, et al. Tracking and follow-up of 16,915 adolescents: minimizing attrition bias. Control Clin Trials 1997;18:383–96.[ISI][Medline]
Received for publication August 12, 1999. Accepted for publication February 4, 2000.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (2)
Disclaimer
Request Permissions
Google Scholar
Articles by Koo, M. M.
Articles by Rohan, T. E.
PubMed
PubMed Citation
Articles by Koo, M. M.
Articles by Rohan, T. E.