1 Axes

Let's start with a simple boxplot

le_box <- ggplot(gapminder, aes(x = continent, y = lifeExp)) +
  geom_boxplot() 
le_box

1.1 Swap X and Y axes

le_box + coord_flip()

1.2 Discrete axis: change order

Brief digression into factor exploration

glimpse(gapminder)
Observations: 1,704
Variables: 6
$ country   (fctr) Afghanistan, Afghanistan, Afghanistan, Afghanistan,...
$ continent (fctr) Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asi...
$ year      (int) 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992...
$ lifeExp   (dbl) 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.8...
$ pop       (int) 8425333, 9240934, 10267083, 11537966, 13079460, 1488...
$ gdpPercap (dbl) 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 78...
str(gapminder$continent)
 Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
levels(gapminder$continent)
[1] "Africa"   "Americas" "Asia"     "Europe"   "Oceania" 
class(gapminder$continent)
[1] "factor"
summary(gapminder$continent)
  Africa Americas     Asia   Europe  Oceania 
     624      300      396      360       24 

in dplyr

gapminder %>% 
  count(continent)
Source: local data frame [5 x 2]

  continent     n
     (fctr) (int)
1    Africa   624
2  Americas   300
3      Asia   396
4    Europe   360
5   Oceania    24

Using forcats

# library(forcats)
fct_count(gapminder$continent)
Source: local data frame [5 x 2]

         f     n
    (fctr) (int)
1   Africa   624
2 Americas   300
3     Asia   396
4   Europe   360
5  Oceania    24

Default order is alphabetical. To enforce other orders...

  1. Order by frequency, forwards and backwards.
## order by frequency
gapminder$continent %>%
  levels()
[1] "Africa"   "Americas" "Asia"     "Europe"   "Oceania" 
gapminder$continent %>% 
  fct_infreq() %>%
  levels() %>% head()
[1] "Africa"   "Asia"     "Europe"   "Americas" "Oceania" 
## backwards!
gapminder$continent %>% 
  fct_infreq() %>%
  fct_rev() %>% 
  levels() %>% head()
[1] "Oceania"  "Americas" "Europe"   "Asia"     "Africa"  

Let's use this in a plot now.

ggplot(gapminder, aes(x = fct_infreq(continent), y = lifeExp)) +
  geom_boxplot()

  1. Order by another variable, forwards and backwards. This other variable is usually quantitative and you will order the factor accoding to a grouped summary. The factor is the grouping variable and the default summarizing function is median().
fct_reorder(gapminder$country, gapminder$lifeExp) %>% 
  levels() %>% head()
[1] "Sierra Leone"  "Guinea-Bissau" "Afghanistan"   "Angola"       
[5] "Somalia"       "Guinea"       
## order accoring to minimum life exp instead of median
fct_reorder(gapminder$country, gapminder$lifeExp, min) %>% 
  levels() %>% head()
[1] "Rwanda"       "Afghanistan"  "Gambia"       "Angola"      
[5] "Sierra Leone" "Cambodia"    
## backwards!
fct_reorder(gapminder$country, gapminder$lifeExp, .desc = TRUE) %>% 
  levels() %>% head()
[1] "Iceland"     "Japan"       "Sweden"      "Switzerland" "Netherlands"
[6] "Norway"     

Let's use this in a plot now!

ggplot(gapminder, aes(x = fct_reorder(continent, lifeExp), 
                      y = lifeExp)) +
  geom_boxplot()

Example of why we reorder factor levels: often makes plots much better! When a factor is mapped to x or y, it should almost always be reordered by the quantitative variable you are mapping to the other one.

gap_asia_2007 <- gapminder %>% filter(year == 2007, continent == "Asia")
ggplot(gap_asia_2007, aes(x = lifeExp, y = country)) + geom_point()

ggplot(gap_asia_2007, aes(x = lifeExp, y = fct_reorder(country, lifeExp))) +
  geom_point()

Use fct_reorder2() when you have a line chart of a quantitative x against another quantitative y and your factor provides the color. This way the legend appears in some order as the data!

h_countries <- c("Egypt", "Haiti", "Romania", "Thailand", "Venezuela")
h_gap <- gapminder %>%
  filter(country %in% h_countries) %>% 
  droplevels()
ggplot(h_gap, aes(x = year, y = lifeExp, color = country)) +
  geom_line()

ggplot(h_gap, aes(x = year, y = lifeExp,
                  color = fct_reorder2(country, year, lifeExp))) +
  geom_line() +
  labs(color = "country")

  1. Order by strong feelings on the matter
h_gap$country %>% fct_relevel("Romania", "Haiti") %>% levels()
[1] "Romania"   "Haiti"     "Egypt"     "Thailand"  "Venezuela"

Use base R tricks

# Manually set the order of a discrete-valued axis
le_box + scale_x_discrete(limits = c("Asia","Oceania","Africa","Europe", "Americas"))

# Reverse the order of a discrete-valued axis
# Get the levels of the factor
clevels <- levels(gapminder$continent)
clevels
[1] "Africa"   "Americas" "Asia"     "Europe"   "Oceania" 
clevels <- rev(clevels)
clevels
[1] "Oceania"  "Europe"   "Asia"     "Americas" "Africa"  
le_box + scale_x_discrete(limits = clevels)

# Or it can be done in one line:
le_box + scale_x_discrete(limits = rev(levels(gapminder$continent)))

1.3 Tick mark labels

For discrete variables, the tick mark labels are taken directly from levels of the factor. However, sometimes the factor levels have short names that aren’t suitable for presentation.

le_box + scale_x_discrete(breaks = c("Asia","Oceania","Africa","Europe", "Americas"),
                          labels=c("Asia", "Oceanica", "Africa", "Europe", "Americas"))

# Hide x tick marks, labels, and grid lines
le_box + scale_x_discrete(breaks=NULL)

1.4 Continuous: change limits

Very important: avoid using xlim or ylim to control the limits of your plot if doing a boxplot. Example:

le_box + ylim(40, 80)
Warning: Removed 145 rows containing non-finite values (stat_boxplot).

If the y range is reduced using the method above, the data outside the range is ignored. This might be OK for a scatterplot, but it can be problematic for the box plots used here. Instead, use coord_cartesian- instead of setting the limits of the data, it sets the viewing area of the data.

# Using coord_cartesian "zooms" into the area
le_box + coord_cartesian(ylim=c(40, 80))

Also, you can make sure to include 0 as a minimum for example.

le_box + expand_limits(y = c(0, 100))

# le_box + scale_y_continuous(limits=c(0, 100))

1.5 Axis labels/text formatting

le_box + scale_x_discrete(name = "Continent") + scale_y_continuous(name = "Life Expectancy in Years")

Can also take away an axis title

le_box + scale_x_discrete(name = "") + scale_y_continuous(name = "Life Expectancy in Years")

Both of these can also be accomplished by using the xlab/ylab shorthand if all you want to do is change the actual axes labels

le_box + xlab("") + ylab("Life Expectancy in Years")

# or even
# le_box + labs(x = "", y = "Life Expectancy in Years")

1.6 Specify tick marks

# Specify tick marks directly
le_box + scale_y_continuous(breaks = seq(20, 80, 10))  # Ticks from 20-80, every 10 years

2 Gridlines/themes

2.1 Hide gridlines

# Hide all the gridlines
le_box + theme(panel.grid.minor=element_blank(),
           panel.grid.major=element_blank())

# Hide just the minor gridlines
le_box + theme(panel.grid.minor=element_blank())

Or hide just the vertical or horizontal ones

# Hide all the vertical gridlines
le_box + theme(panel.grid.minor.x=element_blank(),
           panel.grid.major.x=element_blank())

# Hide all the horizontal gridlines
le_box + theme(panel.grid.minor.y=element_blank(),
           panel.grid.major.y=element_blank())

If you don't like the default theme_grey

le_box + theme_bw()

le_box + theme_minimal()

le_box + theme_classic()

le_box + theme_dark()

See more here: http://docs.ggplot2.org/current/ggtheme.html

For example, the seaborn library in python uses "#EAEAF2" as the fill. Let's change the background to that.

le_box + theme(panel.background = element_rect(fill = "#EAEAF2"))

You can also save this to use over and over again in one doc...

sb_back <- ggplot2::theme(
  panel.background = element_rect(fill = "#EAEAF2")
)
le_box + sb_back

Change color of gridlines:

  • panel.grid for all
  • panel.grid.major for just major (add .x or .y. to control axes)
  • panel.grid.minor for just minor (add .x or .y. to control axes)

All the arguments for element_line are documented here: http://docs.ggplot2.org/current/element_line.html

Here are the linetype options: http://docs.ggplot2.org/current/vignettes/ggplot2-specs.html

?element_line

Combinations are endless!

le_box + theme(panel.grid.major = element_line(colour = "navy",
                                               linetype = 6,
                                               size = .1))

3 Legends

Let's add some color to our simple boxplot.

le_box <- ggplot(gapminder, aes(x = continent, y = lifeExp, fill = continent)) +
  geom_boxplot()
le_box

Don't worry- we'll play with actual colors next week! For now, we'll focus on the legend that appeared.

3.1 Remove a legend

# Remove legend for a particular aesthetic (fill)
le_box + guides(fill=FALSE)

# It can also be done when specifying the scale
le_box + scale_fill_discrete(guide=FALSE)

# This removes all legends if you have more than 1
le_box + theme(legend.position="none")

3.2 Change order of legend

Remember how we changed the order of a discrete axis using scale_x_discrete (or scale_y_discrete)? Now we introduce the corollary for other aesthetics.

le_box + scale_fill_discrete(breaks = c("Oceania", "Europe", "Asia", "Africa", "Americas"))

3.3 Hide legend title

# Remove title for fill legend
le_box + guides(fill=guide_legend(title=NULL))

# Remove title for all legends
le_box + theme(legend.title=element_blank())

3.4 Modify legend text

le_box + scale_fill_discrete(name="The\nContinent")

le_box + scale_fill_discrete(name="The\nContinent",
                         breaks=c("Oceania", "Europe", "Asia", "Africa", "Americas"),
                         labels=c("Oceanica", "Europe", "Asia", "Africa", "The Americas"))

Note that this didn’t change the x axis labels (we know how to do this though).

3.5 Legend box

3.6 Legend position

le_box + theme(legend.position="top")

le_box + theme(legend.position="bottom")

# Set the "anchoring point" of the legend (bottom-left is 0,0; top-right is 1,1)
# Put bottom-left corner of legend box in bottom-left corner of graph
le_box + theme(legend.justification=c(0,0), legend.position=c(0,0))

# Put bottom-right corner of legend box in bottom-right corner of graph
le_box + theme(legend.justification=c(1,0), legend.position=c(1,0))

Future topics that involve even more tweaking:

  • plot annotations via ggrepel, geom_text, geom_annotate

Next week:

  • colors!
  • more colors via packages
  • more themes via packages