This assignment is intended to give you an opportunity to experiment with the R-based GIS workflow demonstrated during Thursday’s seminar. The specific learning objectives are:
arcpullr to load data from an ArcGIS GeoService API
endpointggplot2/geom_sf
and basemapsYou will make a series of maps during this assignment, using data from Portland, Oregon’s GIS data portal and the US Census Bureau.
You will need to sign up for a key for the Census’s various APIs.
Here are some of the commands you may need for this assignment:
sf (working with shapefiles and geometry)st_read: Reads an ArcGIS shapefile (as well as many
other kinds of geospatial data files)st_crs: Access CRS information, either for a given
object or (for standard numbered EPSG CRSs) by lookup number:
st_crs(my_data): Ask my_data what its CRS
isst_crs(3857): Look up EPSG 3857st_transform: Change the CRS of a variablest_bbox: Find the bounding box of a geometrygeom_sf/coord_sf: Work with SF geometries
in ggplotAnd here
is a cheat sheet for assorted other st_* functions.
arcpullr (ArcGIS API)get_spatial_layer: Loads an entire layer from the
ArcGIS server; for some data sources, this may involve a large
quantity of data
get_layer_by_poly: Loads an entire layer, but
restricted to include features located within a particular geographic
extent (polygon)
tidycensus (Census Bureau API)get_acs: Retrieves data from the American Community
Survey 1- or 5-year snapshotget_decennial: Retrieves data from the specified
decennial census, if availableload_variables: Retrieves data dictionary for specified
data product (acs, etc.)For this part of the assignment, you will work with Portland,
Oregon’s GIS data portal and the arcpullr package. You will
loosely follow The datasets you will use are as follows
https://www.portlandmaps.com/od/rest/services/COP_OpenData_Boundary/MapServer/125/https://www.portlandmaps.com/od/rest/services/COP_OpenData_Environment/MapServer/35/https://www.portlandmaps.com/od/rest/services/COP_OpenData_ImportantPlaces/MapServer/40/Note: I’ve included the relevant API URL for each dataset (for
use with get_spatial_layer) as a convenience; you can find
it on your own in the ArcGIS web UI as well, but a little bit of
adjustment is needed to get it in the format that arcpullr
expects.
Begin by first familiarizing yourself with the datasets and their contents. Possible questions to ask:
Using arcpullr’s get_spatial_layer(), load
the three datasets listed above; check the resulting data frames to
ensure that the number of features and attributes are what you
expect.
ggplot2 and geom_sf, produce an
overview map of the city’s neighborhood boundaries.geom_sf listens for the
fill and color aesthetics, just like
geom_point() etc.Your final product should look something like this:
Next, pick a neighborhood on which to focus your analysis (if you’re not familiar with Portland, try “Foster-Powell”).
coord_sf to control the bounds of
your ggplot-based map.filter
command with SFC objects.basemaps package’s draw_ext() functiongeom_sf_label(), which will
make map labels based on feature attributes (e.g. neighborhood names;
note that the neighborhood boundary dataset has an attribute named
MAPLABEL for this exact purpose).coord_sf are specified programmatically (i.e.,
without you having to type them in or figure them out by hand).
st_buffer()Finally, write an R function that takes as an argument the name of a neighborhood, and automatically produces a version of the map from Step “C”. It might look something like this:
make.neighborhood.map <- function(some.neighborhood.name) {
# Stuff happens here
}
And then the idea is that one could call it like so:
make.neighborhood.map("FOSTER-POWELL")
Before you can use the Census API and the tidycensus
library, you must register for an API key (see link in the “Before You
Begin” section). Once you have done this, run the following chunk of
code (once per session, just after loading the tidycensus
package, like so:
library(tidycensus)
options(tigris_use_cache = TRUE) # don't re-download geometry un-necessarily
census_api_key("YOUR_CENSUS_KEY_GOES_HERE")
Spend some time with the data dictionaries for both the 2020 5-year American Community Survey dataset and the 2010 Decennial summary file dataset (the 2020 data is only just now starting to appear, and the summary files are not ready yet):
load_variables(2020, "acs5/profile") # ACS
load_variables(2010, "sf1") # 2010 Decennial
Note: Some of the 2020 data is available; the “Public Law” dataset (used for congressional redistricting) has been released. This dataset focuses largely on basic population counts and simplified racial/ethnic demographics, and is quite useful for many analytical questions.
There are a very large number of variables in the ACS and
Decennial datasets, so as you familiarize yourself with the data, you
may find it useful to do so using RStudio’s View command,
like so; this will let you search for keywords of interest (“race”,
“housing”, etc.):
load_variables(2020, "acs5/profile") %>% View
As another option, the Census maintains detailed documentation on all of its available datasets; that page is a bit overwhelming, so here are shortcuts to the lists of categories of variables for the datasets (click “selected variables” to expand a group):
Once you’ve got a sense of what data are available, pick a set of variables to focus on. For example, you might focus on demographic characteristics (proportion of the population belonging to a particular group, or having a particular ancestry), or you might be more interested in questions of housing, employment, etc.
What variable(s) did you choose, and why? What can you find out about the way that the Census defines, collects, and reports this sort of data?
Depending on the question that has caught your interest, you may wish to look for previous years’ data as well, in case longitudinal change is of interest to you.
Choose one of the following:
tidycensus’s
get_acs or get_decennial functions to pull
down county-level data for your variable of interest.(If you would rather work on a different geography other than state or county, feel free to do so.)
Either way, make a chloropleth map using geom_sf and its
fill aesthetic.
tidycensus functions
can return spatial as well as numeric data, by setting
geometry=TRUETry and find a map projection specific to the state that you are
mapping (e.g. “Oregon Lambert” for Oregon); use
st_transform to re-project your data into that new
projection before plotting it. If you are making a tract-level map of a
specific county or set of counties, you can still use their home state’s
projection.
Where to find projections:
The default colors are probably not appropriate for your data; use
what you have learned earlier in the program about data visualization
ggplot, and give that aspect of your map some
attention.
ggplot has themes,
e.g. theme_bw, any of the ggthemes choices,
etc.It is important for maps (and all data visualizations) to have
appropriate labels, to help provide the viewer with context. In this
part of the assignment, take a moment to add a title, and, if
appropriate, a subtitle or caption to your map, providing context about
what it is showing and where the data came from. For maps, it is also
good practice to include information about the scale of the map; the
annotation_scale geom from ggspatial can be
used to do so:
my.previous.example.map + annotation_scale()
Now let’s combine our City of Portland data and our Census data. Portland’s demographic geography has a very long and complex history, and it has only become more so in recent years; suffice it to say that racial and ethnic groups are not uniformly distributed across the city.
What else is not uniformly distributed? Grocery stores! In the last part of this assignment, we will overlay our grocery store data over demographic geography, and see what patterns emerge.
Using get_acs(), pull down tract-level data for
Multnomah County, Oregon. We will use the 2020 ACS 5-year profile
dataset, and use variable DP05_0065P (“Race alone or in
combination with one or more other races/Total population/Black or
African American”, percentage form). Make sure to pull down the geometry
for the county’s census tracts!
Note: As mentioned during the lecture, this is a very naïve way to be using ACS data; it’s adequate for the purposes of this assignment, but please be careful following this specific recipe in your own “real-world” work.
Make a quick map; what patterns do you notice?
Look at your plots of Portland’s neighborhoods from earlier in the assignment; compare with the county-level map you just made. Note that the county covers much more geographic area than the city itself!
Our grocery store dataset only extends to the city limits; for our map of Census data to match, we must somehow exclude all of the census tracts that are outside the city.
To do this, we will intersect our census tract geometry with
the geometry representing the city’s different neighborhoods.
st_intersection() will take two geometries, X
and Y, and compute a new version of X that only includes
the shared geometry.
Using st_intersection() produce a version of the ACS
data from Step “A” that only includes tracts from the city itself, and
excludes anything from the rest of Multnomah County. Plot the
result.
Note: You may notice that the census tracts look quite similar to the city’s administrative neighborhoods; this is not an accident: the Census makes use of pre-existing administrative and geographic boundaries as starting points when defining its tracts.
Take the resulting map from Step “B”, and overlay the grocery store data that you worked with in Part 1. For this final product, pay attention to the same questions of “polish” that you worked with in Step 2 (titles, captions, color schemes, etc.), and consider the following questions:
Note: There are no “right” answers to those questions! Or, rather, there are many different ways to answer each of them!