Accuracy and Repeatability of Commercial Geocoding

Eric A. Whitsel1,2 , Kathryn M. Rose2, Joy L. Wood2, Amanda C. Henley3, Duanping Liao4 and Gerardo Heiss2

1 Department of Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC.
2 Department of Epidemiology, School of Public Health, University of North Carolina, Chapel Hill, NC.
3 University of North Carolina Libraries, Chapel Hill, NC.
4 Department of Health Evaluation Sciences, Pennsylvania State University College of Medicine, Hershey, PA.

Received for publication January 16, 2004; accepted for publication June 28, 2004.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The authors estimated accuracy and repeatability of commercial geocoding to guide vendor selection in the Life Course Socioeconomic Status, Social Context and Cardiovascular Disease study (2001–2002). They submitted 1,032 participant addresses (97% in Maryland, Minnesota, Mississippi, or North Carolina) to vendor A twice over 9 months and measured repeatability as agreement between levels of address matching, discordance (%) between statistical tabulation areas, and median distance (d, in meters) and bearing ({theta}, in degrees) between coordinates assigned on each occasion (Ho:{Sigma}i = 1 -> n [{theta}i /n] = 180°). They also submitted 75 addresses of nearby air pollution monitors (77% urban/suburban; 69% residential/commercial) to vendors A and B and then measured accuracy by comparing vendor- and US Environmental Protection Agency (EPA)–assigned geocodes using the above measures. Repeatability of geocodes assigned by vendor A was high (kappa = 0.90; census block group discordance = 5%; d < 1 m; {theta} = 177°). The match rate for EPA monitor addresses was higher for vendor B versus A (88% vs. 76%), but discordance at census block group, tract, and county levels also was, respectively, 1.4-, 1.9-, and 5.0-fold higher for vendor B. Moreover, coordinates assigned by vendor B were further from those assigned by the EPA (d = 212 m vs. 149 m; {theta} = 131° vs. 171°). These findings suggest that match rates, repeatability, and accuracy should be used to guide vendor selection.

air pollution; cardiovascular diseases; geographic information systems; reproducibility of results

Abbreviations: Abbreviations: EPA, US Environmental Protection Agency; FIPS, Federal Information Processing Standards; LC-SES, Life Course Socioeconomic Status, Social Context and Cardiovascular Disease.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The TIGER/Line file is a digital database developed and revised periodically by the US Census Bureau to support its ongoing needs (1). This database includes spatially referenced census statistical boundaries and street maps covering the entire United States that allow commercial vendors, at the request of their clients, to assign statistical tabulation areas and spatial coordinates (collectively known as geocodes) to addresses by using geographic information systems software.

The use of commercially assigned geocodes to link contextual measures of socioeconomic and environmental exposures to individual study participants is becoming increasingly common in public health studies; yet, until recently, the public health literature remained virtually silent on the topic of error in commercial geocoding. In 2001, the demonstration that the accuracy and per-address cost of commercial geocoding vary dramatically across vendors underscored the importance of the topic (2). Although this important finding was subsequently interpreted in the context of other rapidly published studies (37), it became clear that accuracy deserves as much—if not more—emphasis and consideration as completeness of commercial geocoding in contextual studies of health and disease (8).

As an otherwise reasonable means of marketing their services and products to prospective clients, many commercial vendors nonetheless continue to emphasize high address match rates and the role of proprietary street databases in obtaining them. This emphasis may be both misplaced and misleading if, for example, a commercial vendor unwittingly uses a proprietary database to improve its match rates at the expense of overall accuracy and repeatability. Faced with selecting a vendor in the setting of the Life Course Socioeconomic Status, Social Context and Cardiovascular Disease (LC-SES) study, we therefore examined the address match rate, accuracy, and repeatability of commercial vendors.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The LC-SES study has been described in detail elsewhere (9). Briefly, during 2001–2002, we conducted an ancillary telephone survey of the biracial, population-based Atherosclerosis Risk in Communities (ARIC) cohort (10, 11). We asked participants living predominantly in one of four US communities (suburban Minneapolis, Minnesota; Washington County, Maryland; Forsyth County, North Carolina; and Jackson, Mississippi) to recall their complete residential addresses (street number, street name, city, state) at ages 30, 40, and 50 years. At the time of the survey, the cohort members were 56–80 years of age. Response to the survey was 93 percent. We corrected city misspellings, applied a two-character coding standard to states (12), and submitted the addresses to a commercial vendor (vendor A) for address matching accompanied by only an encrypted study identifier under the terms of a contract negotiated by our university counsel and approved by our institutional review board. Approximately 9 months later, we resubmitted to the same vendor a randomly selected subset of 1,032 unique addresses for participants at age 50 years. In the interim, the vendor updated its street database bimonthly. We measured the repeatability of geocodes assigned on the first and second submissions as the agreement between address match types (street, zip code, no match), concordance (percentage) between Federal Information Processing Standards (FIPS) codes associated with statistical tabulation areas (census block groups, tracts, counties) (13), and mean bearing ({theta}, in degrees) and distance (d, in meters) between spatial coordinates (longitudes, latitudes). More specifically, we measured agreement by using the kappa statistic and 95 percent confidence limits (1416), {theta} by using angle trigonometry (17), and d by using the Haversine spherical Earth formula (figure 1) (18).



View larger version (10K):
[in this window]
[in a new window]
 
FIGURE 1. Estimation of distance (d, in meters) and bearing ({theta}, in degrees) between coordinates assigned on the initial submission and resubmission of addresses in the Life Course Socioeconomic Status, Social Context and Cardiovascular Disease study, United States, 2001–2002. Coordinates in decimal degrees were converted to radians (i.e., multiplied by {pi}/180), differences between coordinates (e.g., x1x0 = xdif) were calculated, and d was estimated by using the Haversine spherical Earth formula and {theta} was estimated by using angle trigonometry. C = 2 x sin–1(min(1, A0.5)), where A = (sin(xdif/2))2 + cos(x0) x cos(x1) x (sin(ydif/2))2. R, radius of the Earth (m).

 
We also submitted to vendor A and another vendor (vendor B) the addresses of 75 air pollution monitors located near LC-SES participants’ current residences that were spatially referenced by the US Environmental Protection Agency (EPA) by using standard geographic methods (19). These positionally unique monitors measured ambient concentrations of criteria air pollutants between 1997 and 2002 (20). Both vendors used CASS-certified (21) address standardization software, the 1990 TIGER/Line file, Zip+4 file (22), and default offsets (differing by only 10 feet (3 m)) to assign geocodes as accurately as possible to exact street addresses or zip+4, zip+2, or zip code centroids. Vendor B also used a proprietary street database to improve match rates.

To compare accuracy of geocodes provided by the vendors, we overlaid the coordinates of monitors in the EPA database on 1990 census block group, tract, and county maps after converting, when necessary, point and polygon files to a standard geographic coordinate system, the North American Datum of 1983 (NAD83), using ArcView GIS 3.3 software and ESRI Data & Maps 2000 (23). We then used the measures of concordance, bearing, and distance defined above to estimate the accuracy of geocodes assigned by vendors by comparing them with FIPS codes obtained indirectly from our maps or coordinates obtained directly from the EPA database.

We estimated all measures of repeatability and accuracy by using SAS software, version 8.02 (24). We based all analyses of coordinates on values recorded in decimal degrees with six significant digits after the decimal point. We classified agreement as excellent (kappa > 0.75), good (0.4 < kappa ≤0.75), or marginal (0 ≤ kappa <0.4) according to Landis and Koch (25). For reference, we defined repeatability and accuracy as perfect when kappa = 1, concordance = 100 percent, {Sigma}i = 1 -> n[di/n] = 0 m, and limdi -> 0({Sigma}i = 1 -> n[{theta}i/n]) = 180°. In the information that follows, we deliberately continue the practice of generically labeling commercial vendors to mask their identity (2).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Table 1 describes the characteristics of the 1,032 LC-SES participant addresses used to estimate the repeatability of geocoding. A majority of the addresses (89 percent) were complete. For the remainder, only street names (8 percent) or intersecting street names (3 percent), city, and state were available. Most were located in Maryland or North Carolina (57 percent), fewer in Mississippi or Minnesota (39 percent), and the fewest in other states (3 percent).


View this table:
[in this window]
[in a new window]
 
TABLE 1. Characteristics of the 1,032 participant addresses used to estimate the repeatability of geocoding, Life Course Socioeconomic Status, Social Context and Cardiovascular Disease study, United States, 2001–2002
 
Table 2 presents the 9-month repeatability of geocoding the 1,032 LC-SES participant addresses. Although there was some evidence of temporal improvement in street matching between April 2002 and January 2003, overall agreement of address matching was excellent (kappa = 0.90 (95 percent confidence interval: 0.86, 0.93)) and was consistent with the high concordance between FIPS codes at the block group level (95 percent), low mean distance between coordinates (<1 m), and mean bearing between coordinates (177°), approximating that of a perfectly repeatable measure.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Nine-month repeatability of geocoding the 1,032 participant addresses, Life Course Socioeconomic Status, Social Context and Cardiovascular Disease study, United States, 2001–2002*
 
Table 3 describes the characteristics of the 75 EPA-operated monitors used to estimate the accuracy of geo-coding. Most were associated with complete addresses (75 percent) located in Minnesota or North Carolina (84 percent), located in urban or suburban settings (77 percent), and designated for residential or commercial use (69 percent). They were established between 1965 and 2001 by using a known method of geographic positioning (49 percent), horizontal accuracy of coordinates (49 percent), and datum (37 percent).


View this table:
[in this window]
[in a new window]
 
TABLE 3. Characteristics of the 75 ambient criteria air pollutant monitor* addresses used to estimate the accuracy of geocoding, Life Course Socioeconomic Status, Social Context and Cardiovascular Disease study, United States, 2001–2002
 
Table 4 presents the concordance between FIPS codes associated with vendor- and EPA-assigned coordinates. At the census block group, tract, and county levels, the address match (concordance + discordance) rate was lower for vendor A versus vendor B (76 percent vs. 88 percent). In contrast, discordance between FIPS codes at the block group, tract, and county levels was, respectively, 1.4-, 1.9-, and 5.0-fold greater for vendor B versus vendor A.


View this table:
[in this window]
[in a new window]
 
TABLE 4. Accuracy of geocoding the 75 ambient criteria air pollutant monitor addresses in the Life Course Socioeconomic Status, Social Context and Cardiovascular Disease study, expressed as the concordance between FIPS* codes, United States, 2001–2002{dagger}
 
Table 5 presents the distribution of distance and bearing between vendor- and EPA-assigned coordinates. Among all matched addresses, coordinates assigned by vendor B versus vendor A were further from those assigned by the EPA (d = 212 m vs. 149 m; {theta} = 131° vs. 171°). Between-vendor differences in d, but not {theta}, were attenuated by serially excluding addresses matched by only a single vendor and only at the zip code level.


View this table:
[in this window]
[in a new window]
 
TABLE 5. Accuracy of geocoding the 75 ambient criteria air pollutant monitor addresses in the Life Course Socioeconomic Status, Social Context and Cardiovascular Disease study, expressed as the distance* and bearing{dagger} between coordinates, United States, 2001–2002
 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
With increasing frequency, public health professionals are using geocodes assigned by commercial vendors on a fee-per-address basis to link socioeconomic and environmental data to individual study participants to explore the putative role of context in health and disease (2628). The extent to which the growing accessibility of affordable commercial geocoding services, familiarity of investigators with hierarchic modeling strategies, and emphasis of commercial vendors on the role of proprietary street databases in obtaining high address match rates have contributed to this increase remains unknown.

Our examination of the accuracy and repeatability of commercial geocoding in the LC-SES study suggests that emphasis on address match rates and proprietary street databases may at least be misplaced. In general, we found that geocoding performance is inadequately captured by a single statistic such as the match rate and that proprietary databases may, according to several measures, actually reduce geo-coding accuracy. Specifically, in our study, the address match rate was greater for the commercial vendor that used a proprietary street database to improve its match rates, yet this discrepancy was partly attributable to its higher discordance across all statistical tabulation areas. Moreover, spatial coordinates assigned by the vendor in this study that did not use a proprietary street database to improve its match rates were closer—in terms of both distance and bearing—to those in the EPA database. The 9-month repeatability of this vendor also was uniformly high across all performance measures.

At face value, a higher match rate would seem to imply lower discordances, smaller distances, bearings that approximate 180°, and high repeatability. Instead, we found the opposite to be true. Our findings, although in appearance counterintuitive, add important information to that previously published by the few other studies of this kind (2, 6, 7) Although the study reported here examined only 1,032 residential addresses, 75 addresses of nearby air pollutant monitors, and two commercial vendors, its sample of "real-world" addresses was relatively large and distributed throughout more than four US states spanning a variety of geographic regions, settings, and predominant land uses.

Issues of generalizability to other commercial vendors notwithstanding, the study reported on here consistently identified the same vendor as the most accurate in terms of its lower FIPS code discordance across statistical tabulation areas. It also characterized the repeatability of this vendor and estimated its accuracy in terms of bearing (measured by using angle trigonometry (17)) and distance (measured by using the Haversine spherical Earth formula (18)) between vendor- and EPA-assigned spatial coordinates recorded with six significant digits. Importantly, the results based on the two additional definitions of accuracy were consistent with those based on FIPS code concordance. They also increased the relevance of our findings for a broad range of future investigations including those estimating proximity to point sources of pollution, major transportation arterials, and various community resources.

Our study may well be more broadly relevant than its predecessors, but using spatial coordinates of EPA monitors as a geodetic standard may have introduced error into the objective assessment of accuracy. Precise locations of monitors and their addresses, for example, may differ (7). In addition, the EPA often established monitors by using (a now) unknown means of geographic positioning, accuracy, or datum (29). However, monitor coordinates are collected according to a Federal Interagency Coordinating Committee on Digital Cartography accuracy standard of 25 m by using standard methods and equipment including geodetic- and navigation-quality global positioning systems (19). Moreover, in this study, the median accuracy of coordinates established by using known methods was high (3 m). Median values of d and {theta} also suggested that coordinates established by using unknown methods were at least as accurate. Furthermore, distances between points determined in different geographic coordinate systems are less than 1 m for the NAD83 versus WGS84 datum and comparably small for the NAD27 versus NAD83 datum (within the range of coordinates encountered in this study) (30, 31). Global positioning systems coordinates may nevertheless reflect error due to availability of base station or satellite signals, satellite clock errors, ephemeris, tropospheric or ionospheric delays, multipath or receiver noise, and, perhaps most importantly, operator error (32). Therefore, our study emphasizes the between-vendor comparison of accuracy and within-vendor estimation of repeatability rather than within-vendor estimation of accuracy per se.

In summary, address match rates can be misleading when presented in isolation as measures of geocoding performance to uninformed investigators. Although these findings await confirmation, in the interim, investigators may want to invest resources to protect the integrity and quality of their data. We conservatively recommend submitting addresses from a given data set en bloc to attenuate effects of street database updates; using known criteria to set problematic addresses to "missing" as a quantifiable means of reducing error in commercially assigned geocodes (until we know more about how inaccurate geocodes affect spatially interpolated exposures, exposure-outcome associations, and their contextual effect modifiers); and using match rates, accuracy, and repeatability to guide selection of commercial vendors. Finally, we call on the geocoding industry to implement an independent system for assessing and controlling the quality of commercially geocoded data using standardized measures, one that is by necessity transparent, but not to the point of compromising the proprietary nature of commercial services or products. We would otherwise be guilty of accepting unknown levels of error in our data and unjustifiably demanding much less from commercial geocoders than, say, our commercial laboratories.


    ACKNOWLEDGMENTS
 
Grants from the National Heart, Lung, and Blood Institute (R01-HL64142-03) funded this study and supported Dr. Whitsel (5-T32-HL07055).

Preliminary findings have been reported elsewhere (33).

The authors are indebted to Dr. Richard L. Smith for providing helpful comments on this manuscript. They also thank the staff of the ARIC study for their important contributions.


    NOTES
 
Reprint requests to Dr. Eric A. Whitsel, Departments of Medicine and Epidemiology, University of North Carolina, Cardiovascular Disease Program, Bank of America Center, Suite 306, 137 East Franklin Street, Chapel Hill, NC 27514 (e-mail: ewhitsel{at}email.unc.edu). Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. US Census Bureau. Topologically Integrated Geographic Encoding and Referencing (TIGER) system, 2004. (www.census.gov/geo/www/tiger/index.html).
  2. Krieger N, Waterman P, Lemieux K, et al. On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research. Am J Public Health 2001;91:1114–16.[Abstract]
  3. Krieger N, Waterman P, Chen JT, et al. Zip code caveat: bias due to spatiotemporal mismatches between zip codes and US census-defined geographic areas—The Public Health Disparities Geocoding Project. Am J Public Health 2002;92:1100–2.[Free Full Text]
  4. Krieger N, Chen JT, Waterman PD, et al. Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter? The Public Health Disparities Geocoding Project. Am J Epidemiol 2002;156:471–82.[Abstract/Free Full Text]
  5. Hurley SE, Saunders TM, Nivas R, et al. Post office box addresses: a challenge for geographic information system-based studies. Epidemiology 2003;14:386–91.[CrossRef][ISI][Medline]
  6. Bonner MR, Han D, Nie J, et al. Positional accuracy of geocoded addresses in epidemiologic research. Epidemiology 2003;14:408–12.[CrossRef][ISI][Medline]
  7. Cayo MR, Talbot TO. Positional error in automated geocoding of residential addresses. Int J Health Geograph 2003;2:10.[CrossRef]
  8. Krieger N. Place, space, and health: GIS and epidemiology. Epidemiology 2003;14:384–5.[CrossRef][ISI][Medline]
  9. The Life Course SES (Socioeconomic Status), Social Context and Cardiovascular Disease (LCSES) study, 2004. (www.lifecourseepi.info/#).
  10. ARIC (Atherosclerosis Risk in Communities) study, 2004. (www.cscc.unc.edu/ARIC/#).
  11. The ARIC Investigators. The Atherosclerosis Risk in Communities (ARIC) study: design and objectives. Am J Epidemiol 1989;129:687–702.[Abstract]
  12. United States Postal Service. Official USPS abbreviations. State abbreviations, 1998. (www.usps.com/ncsc/lookups/usps_abbreviations.html#states).
  13. US Department of Commerce, National Institute of Standards and Technology, Information Technology Laboratory. Federal information processing standards publications (FIPS PUBS), 2003. (www.itl.nist.gov/fipspubs).
  14. Cohen J. A coefficient of agreement for nominal scales. Educat Psychol Measure 1960;20:37–46.[ISI]
  15. Cicchetti DV, Allison T. A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol 1971;11:101–9.
  16. Fleiss JL, Cohen J, Everitt BS. Large sample standard errors of kappa and weighted kappa. Psychol Bull 1969;72:323–7.[ISI]
  17. Varberg D, Purcell EJ, Rigdon SE, eds. Functions and limits: the trigonometric functions. In: Calculus. 8th ed. Upper Saddle River, NJ: Prentice Hall, 2000:56.
  18. Sinnott RW. Virtues of the Haversine. Sky Telescope 1984;68:159.[ISI]
  19. US Environmental Protection Agency. Information resources management policy manual. EPA directive 2100. Chap 13. Locational data, April 8, 1991.
  20. United States Environmental Protection Agency. Airdata, 2003. (www.epa.gov/air/data/info.html).
  21. United States Postal Service, National Customer Support Center certification programs. Coding Accuracy Support System (CASS), 1997. (www.usps.com/ncsc/programs/cass.html).
  22. United States Postal Service, Zip+4 Product, 2004. (www.usps.com/ncsc/addressinfo/zip4.htm).
  23. Environmental Systems Research Institute, Inc, Redlands, CA, 2004. CD-ROM metadata. (www.esri.com). (Boundary data were extracted and processed en bloc from the 1990 TIGER/Line file to assure perfect overlay characteristics).
  24. SAS Institute, Inc. SAS software. Cary, NC: SAS Institute, Inc, 2004. (www.sas.com).
  25. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.[ISI][Medline]
  26. Diez-Roux AV, Merkin SS, Arnett D, et al. Neighborhood of residence and incidence of coronary heart disease. N Engl J Med 2001;345:99–106.[Abstract/Free Full Text]
  27. LeClere F, Rogers R, Peters K. Neighborhood social context and racial differences in women’s heart disease mortality. J Health Soc Behav 1998;3:91–107.
  28. Pope CA III, Burnett RT, Thun MJ, et al. Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. JAMA 2002;287:1132–41.[Abstract/Free Full Text]
  29. US Environmental Protection Agency. Locational Data Improvement Project (LDIP), 2002. (www.epa.gov/enviro/html/locational/ldip).
  30. National Imagery and Mapping Agency. Department of Defense World Geodetic System 1984 (WGS84): its definition and relationships with local geodetic systems. 3rd ed. (Technical report NIMA TR 8350.2). (January 3, 2000). (ftp://164.214.2.65/pub/gig/tr8350.2/wgs84fin.pdf).
  31. US Department of Commerce. National Oceanic and Atmospheric Administration. North American Datum of 1983 (NAD83). Schwartz CR, ed. 1989. (NOAA professional paper NOS 2).
  32. Liadis JS. GPS TIGER accuracy analysis tools (GTAAT) evaluation and test results (May 24, 2000). (www.census.gov/geo/mod/gtaat2000.pdf).
  33. Whitsel EA, Rose KM, Wood JL, et al. Accuracy and repeatability of commercial geocoding in the Life Course Socioeconomic Status, Social Context & Cardiovascular Disease Study. (Abstract). Circulation 2004;109:25.[CrossRef]