--- title: "Data Visualization" subtitle: "Lab 2" date: "Octrober 6, 2016" author: "Alison Presmanes Hill" output: html_document: keep_md: TRUE highlight: pygments theme: journal smart: false toc: TRUE toc_float: TRUE number_sections: TRUE --- ```{r setup, include = FALSE, cache = FALSE} knitr::opts_chunk$set(error = TRUE, comment = NA, warnings = FALSE, errors = FALSE, messages = FALSE, tidy = FALSE, eval = TRUE) ``` ```{r load-packages, include = FALSE} suppressWarnings(suppressMessages(library(tidyverse))) suppressWarnings(suppressMessages(library(gapminder))) suppressWarnings(suppressMessages(library(forcats))) ``` # Axes Let's start with a simple boxplot ```{r} le_box <- ggplot(gapminder, aes(x = continent, y = lifeExp)) + geom_boxplot() le_box ``` ## Swap X and Y axes ```{r} le_box + coord_flip() ``` ## Discrete axis: change order Brief digression into factor exploration ```{r} glimpse(gapminder) str(gapminder$continent) levels(gapminder$continent) class(gapminder$continent) summary(gapminder$continent) ``` in `dplyr` ```{r} gapminder %>% count(continent) ``` Using `forcats` ```{r} # library(forcats) fct_count(gapminder$continent) ``` Default order is alphabetical. To enforce other orders... 1. Order by frequency, forwards and backwards. ```{r} ## order by frequency gapminder$continent %>% levels() gapminder$continent %>% fct_infreq() %>% levels() %>% head() ## backwards! gapminder$continent %>% fct_infreq() %>% fct_rev() %>% levels() %>% head() ``` Let's use this in a plot now. ```{r} ggplot(gapminder, aes(x = fct_infreq(continent), y = lifeExp)) + geom_boxplot() ``` 2. Order by another variable, forwards and backwards. This other variable is usually quantitative and you will order the factor accoding to a grouped summary. The factor is the grouping variable and the default summarizing function is median(). ```{r} fct_reorder(gapminder$country, gapminder$lifeExp) %>% levels() %>% head() ## order accoring to minimum life exp instead of median fct_reorder(gapminder$country, gapminder$lifeExp, min) %>% levels() %>% head() ## backwards! fct_reorder(gapminder$country, gapminder$lifeExp, .desc = TRUE) %>% levels() %>% head() ``` Let's use this in a plot now! ```{r} ggplot(gapminder, aes(x = fct_reorder(continent, lifeExp), y = lifeExp)) + geom_boxplot() ``` Example of why we reorder factor levels: often makes plots much better! When a factor is mapped to x or y, it should almost always be reordered by the quantitative variable you are mapping to the other one. ```{r} gap_asia_2007 <- gapminder %>% filter(year == 2007, continent == "Asia") ggplot(gap_asia_2007, aes(x = lifeExp, y = country)) + geom_point() ggplot(gap_asia_2007, aes(x = lifeExp, y = fct_reorder(country, lifeExp))) + geom_point() ``` Use `fct_reorder2()` when you have a line chart of a quantitative x against another quantitative y and your factor provides the color. This way the legend appears in some order as the data! ```{r} h_countries <- c("Egypt", "Haiti", "Romania", "Thailand", "Venezuela") h_gap <- gapminder %>% filter(country %in% h_countries) %>% droplevels() ggplot(h_gap, aes(x = year, y = lifeExp, color = country)) + geom_line() ggplot(h_gap, aes(x = year, y = lifeExp, color = fct_reorder2(country, year, lifeExp))) + geom_line() + labs(color = "country") ``` 3. Order by strong feelings on the matter ```{r} h_gap$country %>% fct_relevel("Romania", "Haiti") %>% levels() ``` Use base R tricks ```{r} # Manually set the order of a discrete-valued axis le_box + scale_x_discrete(limits = c("Asia","Oceania","Africa","Europe", "Americas")) # Reverse the order of a discrete-valued axis # Get the levels of the factor clevels <- levels(gapminder$continent) clevels clevels <- rev(clevels) clevels le_box + scale_x_discrete(limits = clevels) # Or it can be done in one line: le_box + scale_x_discrete(limits = rev(levels(gapminder$continent))) ``` ## Tick mark labels For discrete variables, the tick mark labels are taken directly from levels of the factor. However, sometimes the factor levels have short names that aren’t suitable for presentation. ```{r} le_box + scale_x_discrete(breaks = c("Asia","Oceania","Africa","Europe", "Americas"), labels=c("Asia", "Oceanica", "Africa", "Europe", "Americas")) ``` ```{r} # Hide x tick marks, labels, and grid lines le_box + scale_x_discrete(breaks=NULL) ``` ## Continuous: change limits Very important: avoid using `xlim` or `ylim` to control the limits of your plot if doing a boxplot. Example: ```{r} le_box + ylim(40, 80) ``` If the y range is reduced using the method above, the data outside the range is ignored. This might be OK for a scatterplot, but it can be problematic for the box plots used here. Instead, use `coord_cartesian`- instead of setting the limits of the data, it sets the viewing area of the data. ```{r} # Using coord_cartesian "zooms" into the area le_box + coord_cartesian(ylim=c(40, 80)) ``` Also, you can make sure to include 0 as a minimum for example. ```{r} le_box + expand_limits(y = c(0, 100)) # le_box + scale_y_continuous(limits=c(0, 100)) ``` ## Axis labels/text formatting ```{r} le_box + scale_x_discrete(name = "Continent") + scale_y_continuous(name = "Life Expectancy in Years") ``` Can also take away an axis title ```{r} le_box + scale_x_discrete(name = "") + scale_y_continuous(name = "Life Expectancy in Years") ``` Both of these can also be accomplished by using the `xlab`/`ylab` shorthand if all you want to do is change the actual axes labels ```{r} le_box + xlab("") + ylab("Life Expectancy in Years") # or even # le_box + labs(x = "", y = "Life Expectancy in Years") ``` ## Specify tick marks ```{r} # Specify tick marks directly le_box + scale_y_continuous(breaks = seq(20, 80, 10)) # Ticks from 20-80, every 10 years ``` # Gridlines/themes ## Hide gridlines ```{r} # Hide all the gridlines le_box + theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank()) # Hide just the minor gridlines le_box + theme(panel.grid.minor=element_blank()) ``` Or hide just the vertical or horizontal ones ```{r} # Hide all the vertical gridlines le_box + theme(panel.grid.minor.x=element_blank(), panel.grid.major.x=element_blank()) # Hide all the horizontal gridlines le_box + theme(panel.grid.minor.y=element_blank(), panel.grid.major.y=element_blank()) ``` If you don't like the default `theme_grey` ```{r} le_box + theme_bw() le_box + theme_minimal() le_box + theme_classic() le_box + theme_dark() ``` See more here: [http://docs.ggplot2.org/current/ggtheme.html](http://docs.ggplot2.org/current/ggtheme.html) For example, the seaborn library in python uses "#EAEAF2" as the fill. Let's change the background to that. ```{r} le_box + theme(panel.background = element_rect(fill = "#EAEAF2")) ``` You can also save this to use over and over again in one doc... ```{r eval = FALSE} sb_back <- ggplot2::theme( panel.background = element_rect(fill = "#EAEAF2") ) le_box + sb_back ``` Change color of gridlines: * `panel.grid` for all * `panel.grid.major` for just major (add `.x` or `.y`. to control axes) * `panel.grid.minor` for just minor (add `.x` or `.y`. to control axes) All the arguments for `element_line` are documented here: [http://docs.ggplot2.org/current/element_line.html](http://docs.ggplot2.org/current/element_line.html) Here are the linetype options: [http://docs.ggplot2.org/current/vignettes/ggplot2-specs.html](http://docs.ggplot2.org/current/vignettes/ggplot2-specs.html) ```{r} ?element_line ``` Combinations are endless! ```{r} le_box + theme(panel.grid.major = element_line(colour = "navy", linetype = 6, size = .1)) ``` # Legends Let's add some color to our simple boxplot. ```{r} le_box <- ggplot(gapminder, aes(x = continent, y = lifeExp, fill = continent)) + geom_boxplot() le_box ``` Don't worry- we'll play with actual colors next week! For now, we'll focus on the legend that appeared. ## Remove a legend ```{r} # Remove legend for a particular aesthetic (fill) le_box + guides(fill=FALSE) # It can also be done when specifying the scale le_box + scale_fill_discrete(guide=FALSE) # This removes all legends if you have more than 1 le_box + theme(legend.position="none") ``` ## Change order of legend Remember how we changed the order of a discrete axis using `scale_x_discrete` (or `scale_y_discrete`)? Now we introduce the corollary for other aesthetics. ```{r} le_box + scale_fill_discrete(breaks = c("Oceania", "Europe", "Asia", "Africa", "Americas")) ``` ## Hide legend title ```{r} # Remove title for fill legend le_box + guides(fill=guide_legend(title=NULL)) # Remove title for all legends le_box + theme(legend.title=element_blank()) ``` ## Modify legend text ```{r} le_box + scale_fill_discrete(name="The\nContinent") le_box + scale_fill_discrete(name="The\nContinent", breaks=c("Oceania", "Europe", "Asia", "Africa", "Americas"), labels=c("Oceanica", "Europe", "Asia", "Africa", "The Americas")) ``` Note that this didn’t change the x axis labels (we know how to do this though). ## Legend box ## Legend position ```{r} le_box + theme(legend.position="top") le_box + theme(legend.position="bottom") # Set the "anchoring point" of the legend (bottom-left is 0,0; top-right is 1,1) # Put bottom-left corner of legend box in bottom-left corner of graph le_box + theme(legend.justification=c(0,0), legend.position=c(0,0)) # Put bottom-right corner of legend box in bottom-right corner of graph le_box + theme(legend.justification=c(1,0), legend.position=c(1,0)) ``` Future topics that involve even more tweaking: * plot annotations via `ggrepel`, `geom_text`, `geom_annotate` Next week: * colors! * more colors via packages * more themes via packages