Introduction

This assignment is intended to give you an opportunity to experiment with the R-based GIS workflow demonstrated during Thursday’s seminar. The specific learning objectives are:

Use arcpullr to load data from an ArcGIS GeoService API endpoint
Produce simple maps with ggplot2/geom_sf and basemaps
Project data between CRSs
Use simple geometric operations to perform geospatial analyses
Use R functions to make a generic/repeatable workflow

You will make a series of maps during this assignment, using data from Portland, Oregon’s GIS data portal and the US Census Bureau.

Before You Begin

You will need to sign up for a key for the Census’s various APIs.

Cheat Sheet

Here are some of the commands you may need for this assignment:

`sf` (working with shapefiles and geometry)

st_read: Reads an ArcGIS shapefile (as well as many other kinds of geospatial data files)
st_crs: Access CRS information, either for a given object or (for standard numbered EPSG CRSs) by lookup number:
- st_crs(my_data): Ask my_data what its CRS is
- st_crs(3857): Look up EPSG 3857
st_transform: Change the CRS of a variable
st_bbox: Find the bounding box of a geometry
geom_sf/coord_sf: Work with SF geometries in ggplot

And here is a cheat sheet for assorted other st_* functions.

`arcpullr` (ArcGIS API)

get_spatial_layer: Loads an entire layer from the ArcGIS server; for some data sources, this may involve a large quantity of data
- Can include search criteria based on attributes of features (e.g. neighborhood name, facility category, etc.)
get_layer_by_poly: Loads an entire layer, but restricted to include features located within a particular geographic extent (polygon)
- Handy for queries like “all X’s within area Y”

`tidycensus` (Census Bureau API)

get_acs: Retrieves data from the American Community Survey 1- or 5-year snapshot
get_decennial: Retrieves data from the specified decennial census, if available
load_variables: Retrieves data dictionary for specified data product (acs, etc.)

Part 1: Simple Spatial Data

For this part of the assignment, you will work with Portland, Oregon’s GIS data portal and the arcpullr package. You will loosely follow The datasets you will use are as follows

“Neighborhood (Regions)”
- API URL: https://www.portlandmaps.com/od/rest/services/COP_OpenData_Boundary/MapServer/125/
“Parks”
- API URL: https://www.portlandmaps.com/od/rest/services/COP_OpenData_Environment/MapServer/35/
“Grocery Stores”
- API URL: https://www.portlandmaps.com/od/rest/services/COP_OpenData_ImportantPlaces/MapServer/40/

Note: I’ve included the relevant API URL for each dataset (for use with get_spatial_layer) as a convenience; you can find it on your own in the ArcGIS web UI as well, but a little bit of adjustment is needed to get it in the format that arcpullr expects.

Begin by first familiarizing yourself with the datasets and their contents. Possible questions to ask:

What are their attributes?
When were they last updated?
What limitations do you think they might have?
What kind of geometry do they contain (points, polygons, lines, etc.)?
What CRS are the data in?

Step A: Importing Data

Using arcpullr’s get_spatial_layer(), load the three datasets listed above; check the resulting data frames to ensure that the number of features and attributes are what you expect.

Step B: A Simple Map

Using ggplot2 and geom_sf, produce an overview map of the city’s neighborhood boundaries.
Next, add another layer showing the location of the city’s parks, colored dark green.
Finally, add a third layer, overlaying grocery stores, colored according to their “type” attribute.
- Hint: Remember that geom_sf listens for the fill and color aesthetics, just like geom_point() etc.
- Tip: Check out the “Existing

Your final product should look something like this:

Step C: Neighborhood Focus

Next, pick a neighborhood on which to focus your analysis (if you’re not familiar with Portland, try “Foster-Powell”).

Make a version of your map from step “B”, centered on your neighborhood of choice, including some reasonable amount of buffer space around the neighborhoods borders, with the coordinates specified by hand.
- You get to decide what a “reasonable amount of buffer space” looks like
- Hint: Use coord_sf to control the bounds of your ggplot-based map.
- Hint: Remember that you can use the filter command with SFC objects.
- Tip: Besides “eyeballing”, you may want to explore the basemaps package’s draw_ext() function
- Tip: Check out geom_sf_label(), which will make map labels based on feature attributes (e.g. neighborhood names; note that the neighborhood boundary dataset has an attribute named MAPLABEL for this exact purpose).
Now, make a version of this map where the coordinates for coord_sf are specified programmatically (i.e., without you having to type them in or figure them out by hand).
- You may use any of the methods we covered in the lecture:
  - “Find the centroid, expand out”
  - “Bounding box plus some buffer”
  - “Include touching neighborhoods”
- Or, if you feel adventurous, check out st_buffer()

Step D: Automation

Finally, write an R function that takes as an argument the name of a neighborhood, and automatically produces a version of the map from Step “C”. It might look something like this:

make.neighborhood.map <- function(some.neighborhood.name) {
  # Stuff happens here
}

And then the idea is that one could call it like so:

make.neighborhood.map("FOSTER-POWELL")

Part 2: Census API

Before you can use the Census API and the tidycensus library, you must register for an API key (see link in the “Before You Begin” section). Once you have done this, run the following chunk of code (once per session, just after loading the tidycensus package, like so:

library(tidycensus)
options(tigris_use_cache = TRUE) # don't re-download geometry un-necessarily
census_api_key("YOUR_CENSUS_KEY_GOES_HERE")

Step A: Choosing Variables

Spend some time with the data dictionaries for both the 2020 5-year American Community Survey dataset and the 2010 Decennial summary file dataset (the 2020 data is only just now starting to appear, and the summary files are not ready yet):

load_variables(2020, "acs5/profile") # ACS
load_variables(2010, "sf1") # 2010 Decennial

Note: Some of the 2020 data is available; the “Public Law” dataset (used for congressional redistricting) has been released. This dataset focuses largely on basic population counts and simplified racial/ethnic demographics, and is quite useful for many analytical questions.

There are a very large number of variables in the ACS and Decennial datasets, so as you familiarize yourself with the data, you may find it useful to do so using RStudio’s View command, like so; this will let you search for keywords of interest (“race”, “housing”, etc.):

load_variables(2020, "acs5/profile") %>% View

As another option, the Census maintains detailed documentation on all of its available datasets; that page is a bit overwhelming, so here are shortcuts to the lists of categories of variables for the datasets (click “selected variables” to expand a group):

Once you’ve got a sense of what data are available, pick a set of variables to focus on. For example, you might focus on demographic characteristics (proportion of the population belonging to a particular group, or having a particular ancestry), or you might be more interested in questions of housing, employment, etc.

What variable(s) did you choose, and why? What can you find out about the way that the Census defines, collects, and reports this sort of data?

Depending on the question that has caught your interest, you may wish to look for previous years’ data as well, in case longitudinal change is of interest to you.

Step B: Simple Mapping

Choose one of the following:

Pick a US state, and use tidycensus’s get_acs or get_decennial functions to pull down county-level data for your variable of interest.
Pick a county (or group of counties) and pull down tract-level data for your variable of interest.

(If you would rather work on a different geography other than state or county, feel free to do so.)

Either way, make a chloropleth map using geom_sf and its fill aesthetic.

Hint: Remember that the tidycensus functions can return spatial as well as numeric data, by setting geometry=TRUE
Things to pay particular attention to:
- Are you using absolute numbers when you should be using percentages (or vice versa)?
- Are the statistical margins of error sensible/usable for the analysis that you are trying to do?

Step C: Polishing

State-specific Projection

Try and find a map projection specific to the state that you are mapping (e.g. “Oregon Lambert” for Oregon); use st_transform to re-project your data into that new projection before plotting it. If you are making a tract-level map of a specific county or set of counties, you can still use their home state’s projection.

Where to find projections:

Your state-of-interest’s state surveyor’s office probably specifies it on their web page
You can also search the EPSG GeoRepository for mentions of your state; you’ll probably find a few different ones, and any one that you pick will be an improvement over the default.

Colors & Theme

The default colors are probably not appropriate for your data; use what you have learned earlier in the program about data visualization ggplot, and give that aspect of your map some attention.

Tip: Think about whether a continuous or discrete color scale makes the most sense for your data.
Remember that ggplot has themes, e.g. theme_bw, any of the ggthemes choices, etc.

Labels & Scalebar

It is important for maps (and all data visualizations) to have appropriate labels, to help provide the viewer with context. In this part of the assignment, take a moment to add a title, and, if appropriate, a subtitle or caption to your map, providing context about what it is showing and where the data came from. For maps, it is also good practice to include information about the scale of the map; the annotation_scale geom from ggspatial can be used to do so:

my.previous.example.map + annotation_scale()

Part 3: Combining and Intersecting

Now let’s combine our City of Portland data and our Census data. Portland’s demographic geography has a very long and complex history, and it has only become more so in recent years; suffice it to say that racial and ethnic groups are not uniformly distributed across the city.

What else is not uniformly distributed? Grocery stores! In the last part of this assignment, we will overlay our grocery store data over demographic geography, and see what patterns emerge.

Step A: Obtain ACS Data

Using get_acs(), pull down tract-level data for Multnomah County, Oregon. We will use the 2020 ACS 5-year profile dataset, and use variable DP05_0065P (“Race alone or in combination with one or more other races/Total population/Black or African American”, percentage form). Make sure to pull down the geometry for the county’s census tracts!

Note: As mentioned during the lecture, this is a very naïve way to be using ACS data; it’s adequate for the purposes of this assignment, but please be careful following this specific recipe in your own “real-world” work.

Make a quick map; what patterns do you notice?

Step B: Restrict to a Geographic Region

Look at your plots of Portland’s neighborhoods from earlier in the assignment; compare with the county-level map you just made. Note that the county covers much more geographic area than the city itself!

Our grocery store dataset only extends to the city limits; for our map of Census data to match, we must somehow exclude all of the census tracts that are outside the city.

To do this, we will intersect our census tract geometry with the geometry representing the city’s different neighborhoods. st_intersection() will take two geometries, X and Y, and compute a new version of X that only includes the shared geometry.

Using st_intersection() produce a version of the ACS data from Step “A” that only includes tracts from the city itself, and excludes anything from the rest of Multnomah County. Plot the result.

Note: You may notice that the census tracts look quite similar to the city’s administrative neighborhoods; this is not an accident: the Census makes use of pre-existing administrative and geographic boundaries as starting points when defining its tracts.

Step C: Put it all Together!

Take the resulting map from Step “B”, and overlay the grocery store data that you worked with in Part 1. For this final product, pay attention to the same questions of “polish” that you worked with in Step 2 (titles, captions, color schemes, etc.), and consider the following questions:

Do any patterns appear in the distribution of grocery stores and in Portland’s Black-identifying population?
What additional information and context might be helpful for exploring this topic?
Thinking more generally, how might you investigate this question in a more quantitative manner (beyond visual inspection of a map)?

Note: There are no “right” answers to those questions! Or, rather, there are many different ways to answer each of them!

GIS Workshop Assignment

Steven Bedrick

2024-12-09

Introduction

Before You Begin

Cheat Sheet

`sf` (working with shapefiles and geometry)

`arcpullr` (ArcGIS API)

`tidycensus` (Census Bureau API)

Part 1: Simple Spatial Data

Step A: Importing Data

Step B: A Simple Map

Step C: Neighborhood Focus

Step D: Automation

Part 2: Census API

Step A: Choosing Variables

Step B: Simple Mapping

Step C: Polishing

State-specific Projection

Colors & Theme

Labels & Scalebar

Part 3: Combining and Intersecting

Step A: Obtain ACS Data

Step B: Restrict to a Geographic Region

Step C: Put it all Together!

GIS Workshop Assignment

Steven Bedrick

2024-12-09

Introduction

Before You Begin

Cheat Sheet

sf (working with shapefiles and geometry)

arcpullr (ArcGIS API)

tidycensus (Census Bureau API)

Part 1: Simple Spatial Data

Step A: Importing Data

Step B: A Simple Map

Step C: Neighborhood Focus

Step D: Automation

Part 2: Census API

Step A: Choosing Variables

Step B: Simple Mapping

Step C: Polishing

State-specific Projection

Colors & Theme

Labels & Scalebar

Part 3: Combining and Intersecting

Step A: Obtain ACS Data

Step B: Restrict to a Geographic Region

Step C: Put it all Together!

`sf` (working with shapefiles and geometry)

`arcpullr` (ArcGIS API)

`tidycensus` (Census Bureau API)