Let's start with a simple boxplot
le_box <- ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot()
le_box
le_box + coord_flip()
Brief digression into factor exploration
glimpse(gapminder)
Observations: 1,704
Variables: 6
$ country (fctr) Afghanistan, Afghanistan, Afghanistan, Afghanistan,...
$ continent (fctr) Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asi...
$ year (int) 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992...
$ lifeExp (dbl) 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.8...
$ pop (int) 8425333, 9240934, 10267083, 11537966, 13079460, 1488...
$ gdpPercap (dbl) 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 78...
str(gapminder$continent)
Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
levels(gapminder$continent)
[1] "Africa" "Americas" "Asia" "Europe" "Oceania"
class(gapminder$continent)
[1] "factor"
summary(gapminder$continent)
Africa Americas Asia Europe Oceania
624 300 396 360 24
in dplyr
gapminder %>%
count(continent)
Source: local data frame [5 x 2]
continent n
(fctr) (int)
1 Africa 624
2 Americas 300
3 Asia 396
4 Europe 360
5 Oceania 24
Using forcats
# library(forcats)
fct_count(gapminder$continent)
Source: local data frame [5 x 2]
f n
(fctr) (int)
1 Africa 624
2 Americas 300
3 Asia 396
4 Europe 360
5 Oceania 24
Default order is alphabetical. To enforce other orders...
## order by frequency
gapminder$continent %>%
levels()
[1] "Africa" "Americas" "Asia" "Europe" "Oceania"
gapminder$continent %>%
fct_infreq() %>%
levels() %>% head()
[1] "Africa" "Asia" "Europe" "Americas" "Oceania"
## backwards!
gapminder$continent %>%
fct_infreq() %>%
fct_rev() %>%
levels() %>% head()
[1] "Oceania" "Americas" "Europe" "Asia" "Africa"
Let's use this in a plot now.
ggplot(gapminder, aes(x = fct_infreq(continent), y = lifeExp)) +
geom_boxplot()
fct_reorder(gapminder$country, gapminder$lifeExp) %>%
levels() %>% head()
[1] "Sierra Leone" "Guinea-Bissau" "Afghanistan" "Angola"
[5] "Somalia" "Guinea"
## order accoring to minimum life exp instead of median
fct_reorder(gapminder$country, gapminder$lifeExp, min) %>%
levels() %>% head()
[1] "Rwanda" "Afghanistan" "Gambia" "Angola"
[5] "Sierra Leone" "Cambodia"
## backwards!
fct_reorder(gapminder$country, gapminder$lifeExp, .desc = TRUE) %>%
levels() %>% head()
[1] "Iceland" "Japan" "Sweden" "Switzerland" "Netherlands"
[6] "Norway"
Let's use this in a plot now!
ggplot(gapminder, aes(x = fct_reorder(continent, lifeExp),
y = lifeExp)) +
geom_boxplot()
Example of why we reorder factor levels: often makes plots much better! When a factor is mapped to x or y, it should almost always be reordered by the quantitative variable you are mapping to the other one.
gap_asia_2007 <- gapminder %>% filter(year == 2007, continent == "Asia")
ggplot(gap_asia_2007, aes(x = lifeExp, y = country)) + geom_point()
ggplot(gap_asia_2007, aes(x = lifeExp, y = fct_reorder(country, lifeExp))) +
geom_point()
Use fct_reorder2()
when you have a line chart of a quantitative x against another quantitative y and your factor provides the color. This way the legend appears in some order as the data!
h_countries <- c("Egypt", "Haiti", "Romania", "Thailand", "Venezuela")
h_gap <- gapminder %>%
filter(country %in% h_countries) %>%
droplevels()
ggplot(h_gap, aes(x = year, y = lifeExp, color = country)) +
geom_line()
ggplot(h_gap, aes(x = year, y = lifeExp,
color = fct_reorder2(country, year, lifeExp))) +
geom_line() +
labs(color = "country")
h_gap$country %>% fct_relevel("Romania", "Haiti") %>% levels()
[1] "Romania" "Haiti" "Egypt" "Thailand" "Venezuela"
Use base R tricks
# Manually set the order of a discrete-valued axis
le_box + scale_x_discrete(limits = c("Asia","Oceania","Africa","Europe", "Americas"))
# Reverse the order of a discrete-valued axis
# Get the levels of the factor
clevels <- levels(gapminder$continent)
clevels
[1] "Africa" "Americas" "Asia" "Europe" "Oceania"
clevels <- rev(clevels)
clevels
[1] "Oceania" "Europe" "Asia" "Americas" "Africa"
le_box + scale_x_discrete(limits = clevels)
# Or it can be done in one line:
le_box + scale_x_discrete(limits = rev(levels(gapminder$continent)))
For discrete variables, the tick mark labels are taken directly from levels of the factor. However, sometimes the factor levels have short names that aren’t suitable for presentation.
le_box + scale_x_discrete(breaks = c("Asia","Oceania","Africa","Europe", "Americas"),
labels=c("Asia", "Oceanica", "Africa", "Europe", "Americas"))
# Hide x tick marks, labels, and grid lines
le_box + scale_x_discrete(breaks=NULL)
Very important: avoid using xlim
or ylim
to control the limits of your plot if doing a boxplot. Example:
le_box + ylim(40, 80)
Warning: Removed 145 rows containing non-finite values (stat_boxplot).
If the y range is reduced using the method above, the data outside the range is ignored. This might be OK for a scatterplot, but it can be problematic for the box plots used here. Instead, use coord_cartesian
- instead of setting the limits of the data, it sets the viewing area of the data.
# Using coord_cartesian "zooms" into the area
le_box + coord_cartesian(ylim=c(40, 80))
Also, you can make sure to include 0 as a minimum for example.
le_box + expand_limits(y = c(0, 100))
# le_box + scale_y_continuous(limits=c(0, 100))
le_box + scale_x_discrete(name = "Continent") + scale_y_continuous(name = "Life Expectancy in Years")
Can also take away an axis title
le_box + scale_x_discrete(name = "") + scale_y_continuous(name = "Life Expectancy in Years")
Both of these can also be accomplished by using the xlab
/ylab
shorthand if all you want to do is change the actual axes labels
le_box + xlab("") + ylab("Life Expectancy in Years")
# or even
# le_box + labs(x = "", y = "Life Expectancy in Years")
# Specify tick marks directly
le_box + scale_y_continuous(breaks = seq(20, 80, 10)) # Ticks from 20-80, every 10 years
# Hide all the gridlines
le_box + theme(panel.grid.minor=element_blank(),
panel.grid.major=element_blank())
# Hide just the minor gridlines
le_box + theme(panel.grid.minor=element_blank())
Or hide just the vertical or horizontal ones
# Hide all the vertical gridlines
le_box + theme(panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank())
# Hide all the horizontal gridlines
le_box + theme(panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank())
If you don't like the default theme_grey
le_box + theme_bw()
le_box + theme_minimal()
le_box + theme_classic()
le_box + theme_dark()
See more here: http://docs.ggplot2.org/current/ggtheme.html
For example, the seaborn library in python uses "#EAEAF2" as the fill. Let's change the background to that.
le_box + theme(panel.background = element_rect(fill = "#EAEAF2"))
You can also save this to use over and over again in one doc...
sb_back <- ggplot2::theme(
panel.background = element_rect(fill = "#EAEAF2")
)
le_box + sb_back
Change color of gridlines:
panel.grid
for allpanel.grid.major
for just major (add .x
or .y
. to control axes)panel.grid.minor
for just minor (add .x
or .y
. to control axes)All the arguments for element_line
are documented here: http://docs.ggplot2.org/current/element_line.html
Here are the linetype options: http://docs.ggplot2.org/current/vignettes/ggplot2-specs.html
?element_line
Combinations are endless!
le_box + theme(panel.grid.major = element_line(colour = "navy",
linetype = 6,
size = .1))
Let's add some color to our simple boxplot.
le_box <- ggplot(gapminder, aes(x = continent, y = lifeExp, fill = continent)) +
geom_boxplot()
le_box
Don't worry- we'll play with actual colors next week! For now, we'll focus on the legend that appeared.
# Remove legend for a particular aesthetic (fill)
le_box + guides(fill=FALSE)
# It can also be done when specifying the scale
le_box + scale_fill_discrete(guide=FALSE)
# This removes all legends if you have more than 1
le_box + theme(legend.position="none")
Remember how we changed the order of a discrete axis using scale_x_discrete
(or scale_y_discrete
)? Now we introduce the corollary for other aesthetics.
le_box + scale_fill_discrete(breaks = c("Oceania", "Europe", "Asia", "Africa", "Americas"))
# Remove title for fill legend
le_box + guides(fill=guide_legend(title=NULL))
# Remove title for all legends
le_box + theme(legend.title=element_blank())
le_box + scale_fill_discrete(name="The\nContinent")
le_box + scale_fill_discrete(name="The\nContinent",
breaks=c("Oceania", "Europe", "Asia", "Africa", "Americas"),
labels=c("Oceanica", "Europe", "Asia", "Africa", "The Americas"))
Note that this didn’t change the x axis labels (we know how to do this though).
le_box + theme(legend.position="top")
le_box + theme(legend.position="bottom")
# Set the "anchoring point" of the legend (bottom-left is 0,0; top-right is 1,1)
# Put bottom-left corner of legend box in bottom-left corner of graph
le_box + theme(legend.justification=c(0,0), legend.position=c(0,0))
# Put bottom-right corner of legend box in bottom-right corner of graph
le_box + theme(legend.justification=c(1,0), legend.position=c(1,0))
Future topics that involve even more tweaking:
ggrepel
, geom_text
, geom_annotate
Next week: