1 Making tables in Markdown

Sometimes you may just want to type in a table in Markdown and ignore R. Four kinds of tables may be used. The first three kinds presuppose the use of a fixed-width font, such as Courier. The fourth kind can be used with proportionally spaced fonts, as it does not require lining up columns. All of the below will render when typed outside of an R code chunk since these are based on pandoc being used to render your markdown document. Note that these should all work whether you are knitting to either html or PDF.

1.1 Simple table

This code for a simple table:

  Right     Left     Center     Default
-------     ------ ----------   -------
     12     12        12            12
    123     123       123          123
      1     1          1             1

Table:  Demonstration of simple table syntax.

Produces this simple table:

Demonstration of simple table syntax.
Right Left Center Default
12 12 12 12
123 123 123 123
1 1 1 1

The headers and table rows must each fit on one line. Column alignments are determined by the position of the header text relative to the dashed line below it:3

  • If the dashed line is flush with the header text on the right side but extends beyond it on the left, the column is right-aligned.
  • If the dashed line is flush with the header text on the left side but extends beyond it on the right, the column is left-aligned.
  • If the dashed line extends beyond the header text on both sides, the column is centered.
  • If the dashed line is flush with the header text on both sides, the default alignment is used (in most cases, this will be left).
  • The table must end with a blank line, or a line of dashes followed by a blank line.

The column headers may be omitted, provided a dashed line is used to end the table.

1.2 Multi-line tables

This code for a multi-line table:

-------------------------------------------------------------
 Centered   Default           Right Left
  Header    Aligned         Aligned Aligned
----------- ------- --------------- -------------------------
   First    row                12.0 Example of a row that
                                    spans multiple lines.

  Second    row                 5.0 Here's another one. Note
                                    the blank line between
                                    rows.
-------------------------------------------------------------

Table: Here's the caption. It, too, may span
multiple lines.

Produces this multi-line table:

Here's the caption. It, too, may span multiple lines.
Centered Header Default Aligned Right Aligned Left Aligned
First row 12.0 Example of a row that spans multiple lines.
Second row 5.0 Here's another one. Note the blank line between rows.

1.3 Grid tables

This code for a grid table:

: Sample grid table.

+---------------+---------------+--------------------+
| Fruit         | Price         | Advantages         |
+===============+===============+====================+
| Bananas       | $1.34         | - built-in wrapper |
|               |               | - bright color     |
+---------------+---------------+--------------------+
| Oranges       | $2.10         | - cures scurvy     |
|               |               | - tasty            |
+---------------+---------------+--------------------+

Produces this grid table:

Sample grid table.
Fruit Price Advantages

Bananas

$1.34

  • built-in wrapper
  • bright color

Oranges

$2.10

  • cures scurvy
  • tasty

Alignments are not supported, nor are cells that span multiple columns or rows.

1.4 Pipe tables

This code for a pipe table:

| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
|   12  |  12  |    12   |    12  |
|  123  |  123 |   123   |   123  |
|    1  |    1 |     1   |     1  |

  : Demonstration of pipe table syntax.

Produces this pipe table:

Demonstration of pipe table syntax.
Right Left Default Center
12 12 12 12
123 123 123 123
1 1 1 1

2 Making tables in R

If you want to make tables that include R output (like output from functions like means, variances, or output from models), there are two steps:

  1. Get the numbers you need in tabular format; then
  2. Render that information in an aesthetically-pleasing way.

This section covers (1), which easy in R. But, although there are some nice options for (2) within R Markdown via various packages, I am not dogmatic about doing everything in R Markdown, especially things like (2).

2.1 dplyr

We'll use the pnwflights14 package to practice our dplyr skills. We need to download the package from github using devtools.

# once per machine
install.packages("devtools")
devtools::install_github("ismayc/pnwflights14")

Now, we need to load the flights dataset from the pnwflights14 package.

# once per work session
data("flights", package = "pnwflights14")

Brief HLO of the flights data:

dim(flights)
[1] 162049     16
glimpse(flights)
Observations: 162,049
Variables: 16
$ year      (int) 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014...
$ month     (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ day       (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ dep_time  (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
$ arr_time  (int) 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, ...
$ arr_delay (dbl) 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -1...
$ carrier   (chr) "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA"...
$ tailnum   (chr) "N508AS", "N195UW", "N37422", "N547UW", "N762AS", "N...
$ flight    (int) 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 47...
$ origin    (chr) "PDX", "SEA", "PDX", "PDX", "SEA", "SEA", "SEA", "PD...
$ dest      (chr) "ANC", "CLT", "IAH", "CLT", "ANC", "DTW", "ORD", "IA...
$ air_time  (dbl) 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 13...
$ distance  (dbl) 1542, 2279, 1825, 2282, 1448, 1927, 1721, 1825, 1024...
$ hour      (dbl) 0, 0, 0, 0, 0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6...
$ minute    (dbl) 1, 4, 8, 28, 34, 37, 46, 26, 27, 36, 41, 49, 50, 57,...
names(flights)
 [1] "year"      "month"     "day"       "dep_time"  "dep_delay"
 [6] "arr_time"  "arr_delay" "carrier"   "tailnum"   "flight"   
[11] "origin"    "dest"      "air_time"  "distance"  "hour"     
[16] "minute"   

2.1.1 dplyr::select

Use select to specify which columns in a dataframe you'd like to keep by name. Heretofore, this was not possible in base R! In base R, this can only be achieved using numeric variable positions. But most of the time, you keep track of your variables by name (like carrier) rather than position (the 8th column).

# keep these 2 cols
mini_flights <- flights %>% 
  select(carrier, flight)
glimpse(mini_flights)
Observations: 162,049
Variables: 2
$ carrier (chr) "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA", ...
$ flight  (int) 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 478,...
# keep first five cols
first_five <- flights %>% 
  select(year, month, day, dep_time, dep_delay)
glimpse(first_five)
Observations: 162,049
Variables: 5
$ year      (int) 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014...
$ month     (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ day       (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ dep_time  (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
# alternatively, specify range
first_five <- flights %>% 
  select(year:dep_delay)
glimpse(first_five)
Observations: 162,049
Variables: 5
$ year      (int) 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014...
$ month     (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ day       (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ dep_time  (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...

We can also choose the columns we want by negation, that is, you can specify which columns to drop instead of keep. This way, all variables not listed are kept.

# we can also use negation
all_but_year <- flights %>% 
  select(-year)
glimpse(all_but_year)
Observations: 162,049
Variables: 15
$ month     (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ day       (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ dep_time  (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
$ arr_time  (int) 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, ...
$ arr_delay (dbl) 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -1...
$ carrier   (chr) "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA"...
$ tailnum   (chr) "N508AS", "N195UW", "N37422", "N547UW", "N762AS", "N...
$ flight    (int) 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 47...
$ origin    (chr) "PDX", "SEA", "PDX", "PDX", "SEA", "SEA", "SEA", "PD...
$ dest      (chr) "ANC", "CLT", "IAH", "CLT", "ANC", "DTW", "ORD", "IA...
$ air_time  (dbl) 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 13...
$ distance  (dbl) 1542, 2279, 1825, 2282, 1448, 1927, 1721, 1825, 1024...
$ hour      (dbl) 0, 0, 0, 0, 0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6...
$ minute    (dbl) 1, 4, 8, 28, 34, 37, 46, 26, 27, 36, 41, 49, 50, 57,...

dplyr::select comes with several other helper functions...

depart <- flights %>% 
  select(starts_with("dep_"))
glimpse(depart)
Observations: 162,049
Variables: 2
$ dep_time  (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
times <- flights %>% 
  select(contains("time"))
glimpse(times)
Observations: 162,049
Variables: 3
$ dep_time (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 55...
$ arr_time (int) 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, 9...
$ air_time (dbl) 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 130...
# here I am not creating a new dataframe
flights %>%
  select(-contains("time"))
Source: local data frame [162,049 x 13]

    year month   day dep_delay arr_delay carrier tailnum flight origin
   (int) (int) (int)     (dbl)     (dbl)   (chr)   (chr)  (int)  (chr)
1   2014     1     1        96        70      AS  N508AS    145    PDX
2   2014     1     1        -6       -23      US  N195UW   1830    SEA
3   2014     1     1        13        -4      UA  N37422   1609    PDX
4   2014     1     1        -2       -23      US  N547UW    466    PDX
5   2014     1     1        44        43      AS  N762AS    121    SEA
6   2014     1     1        82        88      DL  N806DN   1823    SEA
7   2014     1     1       227       219      UA  N14219   1481    SEA
8   2014     1     1        -4        15      UA  N813UA    229    PDX
9   2014     1     1         7        24      UA  N75433   1576    SEA
10  2014     1     1         1        -6      UA  N574UA    478    SEA
..   ...   ...   ...       ...       ...     ...     ...    ...    ...
Variables not shown: dest (chr), distance (dbl), hour (dbl), minute (dbl)
delays <- flights %>% 
  select(ends_with("delay"))
glimpse(delays)
Observations: 162,049
Variables: 2
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
$ arr_delay (dbl) 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -1...

One of my favorite select helper functions is everything(), which allows you to use select to keep all your variables, but easily rearrange the columns without having to list all the variables to keep/drop.

new_order <- flights %>% 
  select(origin, dest, everything())
head(new_order)
Source: local data frame [6 x 16]

  origin  dest  year month   day dep_time dep_delay arr_time arr_delay
   (chr) (chr) (int) (int) (int)    (int)     (dbl)    (int)     (dbl)
1    PDX   ANC  2014     1     1        1        96      235        70
2    SEA   CLT  2014     1     1        4        -6      738       -23
3    PDX   IAH  2014     1     1        8        13      548        -4
4    PDX   CLT  2014     1     1       28        -2      800       -23
5    SEA   ANC  2014     1     1       34        44      325        43
6    SEA   DTW  2014     1     1       37        82      747        88
Variables not shown: carrier (chr), tailnum (chr), flight (int), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)
# with negation
new_order2 <- flights %>% 
  select(origin, dest, everything(), -year)
head(new_order2)
Source: local data frame [6 x 15]

  origin  dest month   day dep_time dep_delay arr_time arr_delay carrier
   (chr) (chr) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)
1    PDX   ANC     1     1        1        96      235        70      AS
2    SEA   CLT     1     1        4        -6      738       -23      US
3    PDX   IAH     1     1        8        13      548        -4      UA
4    PDX   CLT     1     1       28        -2      800       -23      US
5    SEA   ANC     1     1       34        44      325        43      AS
6    SEA   DTW     1     1       37        82      747        88      DL
Variables not shown: tailnum (chr), flight (int), air_time (dbl), distance
  (dbl), hour (dbl), minute (dbl)

We can also rename variables within select.

flights2 <- flights %>%
  select(tail_num = tailnum, everything())
head(flights2)
Source: local data frame [6 x 16]

  tail_num  year month   day dep_time dep_delay arr_time arr_delay carrier
     (chr) (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)
1   N508AS  2014     1     1        1        96      235        70      AS
2   N195UW  2014     1     1        4        -6      738       -23      US
3   N37422  2014     1     1        8        13      548        -4      UA
4   N547UW  2014     1     1       28        -2      800       -23      US
5   N762AS  2014     1     1       34        44      325        43      AS
6   N806DN  2014     1     1       37        82      747        88      DL
Variables not shown: flight (int), origin (chr), dest (chr), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)

If you don't want to move the renamed variables within your dataframe, you can use the rename function.

flights3 <- flights %>%
  rename(tail_num = tailnum)
glimpse(flights3)
Observations: 162,049
Variables: 16
$ year      (int) 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014...
$ month     (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ day       (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ dep_time  (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
$ arr_time  (int) 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, ...
$ arr_delay (dbl) 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -1...
$ carrier   (chr) "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA"...
$ tail_num  (chr) "N508AS", "N195UW", "N37422", "N547UW", "N762AS", "N...
$ flight    (int) 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 47...
$ origin    (chr) "PDX", "SEA", "PDX", "PDX", "SEA", "SEA", "SEA", "PD...
$ dest      (chr) "ANC", "CLT", "IAH", "CLT", "ANC", "DTW", "ORD", "IA...
$ air_time  (dbl) 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 13...
$ distance  (dbl) 1542, 2279, 1825, 2282, 1448, 1927, 1721, 1825, 1024...
$ hour      (dbl) 0, 0, 0, 0, 0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6...
$ minute    (dbl) 1, 4, 8, 28, 34, 37, 46, 26, 27, 36, 41, 49, 50, 57,...

2.1.2 dplyr::filter

# flights taking off from PDX
pdx <- flights %>% 
  filter(origin == "PDX")
head(pdx)
Source: local data frame [6 x 16]

   year month   day dep_time dep_delay arr_time arr_delay carrier tailnum
  (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)
1  2014     1     1        1        96      235        70      AS  N508AS
2  2014     1     1        8        13      548        -4      UA  N37422
3  2014     1     1       28        -2      800       -23      US  N547UW
4  2014     1     1      526        -4     1148        15      UA  N813UA
5  2014     1     1      541         1      911         4      UA  N36476
6  2014     1     1      549        24      907        12      US  N548UW
Variables not shown: flight (int), origin (chr), dest (chr), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)
# january flights from PDX
pdx_jan <- flights %>% 
  filter(origin == "PDX", month == 1) # the comma is an "and"
head(pdx_jan)
Source: local data frame [6 x 16]

   year month   day dep_time dep_delay arr_time arr_delay carrier tailnum
  (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)
1  2014     1     1        1        96      235        70      AS  N508AS
2  2014     1     1        8        13      548        -4      UA  N37422
3  2014     1     1       28        -2      800       -23      US  N547UW
4  2014     1     1      526        -4     1148        15      UA  N813UA
5  2014     1     1      541         1      911         4      UA  N36476
6  2014     1     1      549        24      907        12      US  N548UW
Variables not shown: flight (int), origin (chr), dest (chr), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)
# flights to ATL (Atlanta) or BNA (Nashville)
to_south <- flights %>% 
  filter(dest == "ATL" | dest == "BNA") %>% # | is "or"
  select(origin, dest, everything())
head(to_south)
Source: local data frame [6 x 16]

  origin  dest  year month   day dep_time dep_delay arr_time arr_delay
   (chr) (chr) (int) (int) (int)    (int)     (dbl)    (int)     (dbl)
1    SEA   ATL  2014     1     1      624        -6     1401        -6
2    SEA   ATL  2014     1     1      802        -3     1533       -17
3    SEA   ATL  2014     1     1      824        -1     1546       -14
4    PDX   ATL  2014     1     1      944        -6     1727        -8
5    PDX   ATL  2014     1     1     1054        94     1807        84
6    SEA   ATL  2014     1     1     1158         6     1915       -14
Variables not shown: carrier (chr), tailnum (chr), flight (int), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)
# flights from PDX to ATL (Atlanta) or BNA (Nashville)
pdx_to_south <- flights %>% 
  filter(origin == "PDX", dest == "ATL" | dest == "BNA") %>% # | is "or"
  select(origin, dest, everything())
head(pdx_to_south)
Source: local data frame [6 x 16]

  origin  dest  year month   day dep_time dep_delay arr_time arr_delay
   (chr) (chr) (int) (int) (int)    (int)     (dbl)    (int)     (dbl)
1    PDX   ATL  2014     1     1      944        -6     1727        -8
2    PDX   ATL  2014     1     1     1054        94     1807        84
3    PDX   ATL  2014     1     1     1323        -2     2038       -15
4    PDX   ATL  2014     1     1     2253         8      611         4
5    PDX   ATL  2014     1     2      627        -3     1350        -7
6    PDX   ATL  2014     1     2      918        -2     1643        -2
Variables not shown: carrier (chr), tailnum (chr), flight (int), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)
# alternatively, using group membership
south_dests <- c("ATL", "BNA")
pdx_to_south2 <- flights %>% 
  filter(origin == "PDX", dest %in% south_dests) %>% 
  select(origin, dest, everything())
head(pdx_to_south2)
Source: local data frame [6 x 16]

  origin  dest  year month   day dep_time dep_delay arr_time arr_delay
   (chr) (chr) (int) (int) (int)    (int)     (dbl)    (int)     (dbl)
1    PDX   ATL  2014     1     1      944        -6     1727        -8
2    PDX   ATL  2014     1     1     1054        94     1807        84
3    PDX   ATL  2014     1     1     1323        -2     2038       -15
4    PDX   ATL  2014     1     1     2253         8      611         4
5    PDX   ATL  2014     1     2      627        -3     1350        -7
6    PDX   ATL  2014     1     2      918        -2     1643        -2
Variables not shown: carrier (chr), tailnum (chr), flight (int), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)
# flights delayed by 1 hour or more
delay_1plus <- flights %>%
  filter(dep_delay >= 60)
head(delay_1plus)
Source: local data frame [6 x 16]

   year month   day dep_time dep_delay arr_time arr_delay carrier tailnum
  (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)
1  2014     1     1        1        96      235        70      AS  N508AS
2  2014     1     1       37        82      747        88      DL  N806DN
3  2014     1     1      346       227      936       219      UA  N14219
4  2014     1     1      650        90     1037        91      US  N626AW
5  2014     1     1      959       164     1137       157      AS  N534AS
6  2014     1     1     1008        68     1242        64      AS  N788AS
Variables not shown: flight (int), origin (chr), dest (chr), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)
# flights delayed by 1 hour, but not more than 2 hours
delay_1hr <- flights %>%
  filter(dep_delay >= 60, dep_delay < 120)
head(delay_1hr)
Source: local data frame [6 x 16]

   year month   day dep_time dep_delay arr_time arr_delay carrier tailnum
  (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)
1  2014     1     1        1        96      235        70      AS  N508AS
2  2014     1     1       37        82      747        88      DL  N806DN
3  2014     1     1      650        90     1037        91      US  N626AW
4  2014     1     1     1008        68     1242        64      AS  N788AS
5  2014     1     1     1014        75     1613        81      UA  N37408
6  2014     1     1     1036        81     1408        63      OO  N218AG
Variables not shown: flight (int), origin (chr), dest (chr), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)
range(delay_1hr$dep_delay, na.rm = TRUE)
[1]  60 119
# even more efficient using between (always inclusive)
delay_bwn <- flights %>%
  filter(between(dep_delay, 60, 119))
head(delay_bwn)
Source: local data frame [6 x 16]

   year month   day dep_time dep_delay arr_time arr_delay carrier tailnum
  (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)
1  2014     1     1        1        96      235        70      AS  N508AS
2  2014     1     1       37        82      747        88      DL  N806DN
3  2014     1     1      650        90     1037        91      US  N626AW
4  2014     1     1     1008        68     1242        64      AS  N788AS
5  2014     1     1     1014        75     1613        81      UA  N37408
6  2014     1     1     1036        81     1408        63      OO  N218AG
Variables not shown: flight (int), origin (chr), dest (chr), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)
range(delay_bwn$dep_delay, na.rm = TRUE)
[1]  60 119

2.1.3 Logical tests in R

Very useful when combined with dplyr::filter

?Comparison
Operator Description

<

less than

<=

less than or equal to

>

greater than

>=

greater than or equal to

==

exactly equal to

!=

not equal to

%in%

group membership

is.na()

is NA

!is.na()

is not NA

?base::Logic
Operator Description

x & y

x AND y (logical and)

x | y

x OR y (logical or)

xor(x, y)

exactly x or y

!x

not x (logical negation)

any()

any true

all()

all true

isTRUE(x)

test if X is TRUE

Logical or (|) is inclusive, so x | y really means:

  • x or
  • y or
  • both x & y

Exclusive or (xor) is exclusive, so xor(x, y) really means:

  • x or
  • y...
  • but not both x & y
x <- c(0, 1, 0, 1)
y <- c(0, 0, 1, 1)
boolean_or <- x | y
exclusive_or <- xor(x, y)
cbind(x, y, boolean_or, exclusive_or)
     x y boolean_or exclusive_or
[1,] 0 0          0            0
[2,] 1 0          1            1
[3,] 0 1          1            1
[4,] 1 1          1            0

2.1.4 dplyr::arrange

# default is ascending order
flights %>% 
  arrange(year, month, day)
Source: local data frame [162,049 x 16]

    year month   day dep_time dep_delay arr_time arr_delay carrier tailnum
   (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)
1   2014     1     1        1        96      235        70      AS  N508AS
2   2014     1     1        4        -6      738       -23      US  N195UW
3   2014     1     1        8        13      548        -4      UA  N37422
4   2014     1     1       28        -2      800       -23      US  N547UW
5   2014     1     1       34        44      325        43      AS  N762AS
6   2014     1     1       37        82      747        88      DL  N806DN
7   2014     1     1      346       227      936       219      UA  N14219
8   2014     1     1      526        -4     1148        15      UA  N813UA
9   2014     1     1      527         7      917        24      UA  N75433
10  2014     1     1      536         1     1334        -6      UA  N574UA
..   ...   ...   ...      ...       ...      ...       ...     ...     ...
Variables not shown: flight (int), origin (chr), dest (chr), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)
# descending order
flights %>% 
  arrange(desc(year), desc(month), desc(day))
Source: local data frame [162,049 x 16]

    year month   day dep_time dep_delay arr_time arr_delay carrier tailnum
   (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)
1   2014    12    31        2        12      601        31      AA  N3JKAA
2   2014    12    31       27        -3      623         3      AA  N3EWAA
3   2014    12    31       39        14      324         4      AS  N762AS
4   2014    12    31       40         0      549         0      DL  N757AT
5   2014    12    31       52        -8      917       -21      AA  N3JFAA
6   2014    12    31       54         4      621        17      DL  N128DL
7   2014    12    31       56        61      848        80      DL  N655DL
8   2014    12    31      512        -3      904         4      US  N653AW
9   2014    12    31      515        -5      855         5      US  N580UW
10  2014    12    31      534         4      859         7      UA  N34460
..   ...   ...   ...      ...       ...      ...       ...     ...     ...
Variables not shown: flight (int), origin (chr), dest (chr), air_time
  (dbl), distance (dbl), hour (dbl), minute (dbl)

2.1.5 dplyr::distinct

Note: we are going to start chaining multiple pipe operators together now. You can chain all tidyr and dplyr functions together!

# all unique origin-dest combinations
flights %>% 
  select(origin, dest) %>% 
  distinct
Source: local data frame [115 x 2]

   origin  dest
    (chr) (chr)
1     PDX   ANC
2     SEA   CLT
3     PDX   IAH
4     PDX   CLT
5     SEA   ANC
6     SEA   DTW
7     SEA   ORD
8     SEA   DEN
9     SEA   EWR
10    PDX   DEN
..    ...   ...
# all unique destinations from PDX (there are 49)
from_pdx <- flights %>% 
  filter(origin == "PDX") %>% 
  select(origin, dest) %>%
  distinct(dest)
head(from_pdx)
Source: local data frame [6 x 2]

  origin  dest
   (chr) (chr)
1    PDX   ANC
2    PDX   IAH
3    PDX   CLT
4    PDX   DEN
5    PDX   PHX
6    PDX   ORD

2.1.6 dplyr::mutate

# add total delay variable
flights %>%
  mutate(tot_delay = dep_delay + arr_delay) %>%
  select(origin, dest, ends_with("delay"), everything())
Source: local data frame [162,049 x 17]

   origin  dest dep_delay arr_delay tot_delay  year month   day dep_time
    (chr) (chr)     (dbl)     (dbl)     (dbl) (int) (int) (int)    (int)
1     PDX   ANC        96        70       166  2014     1     1        1
2     SEA   CLT        -6       -23       -29  2014     1     1        4
3     PDX   IAH        13        -4         9  2014     1     1        8
4     PDX   CLT        -2       -23       -25  2014     1     1       28
5     SEA   ANC        44        43        87  2014     1     1       34
6     SEA   DTW        82        88       170  2014     1     1       37
7     SEA   ORD       227       219       446  2014     1     1      346
8     PDX   IAH        -4        15        11  2014     1     1      526
9     SEA   DEN         7        24        31  2014     1     1      527
10    SEA   EWR         1        -6        -5  2014     1     1      536
..    ...   ...       ...       ...       ...   ...   ...   ...      ...
Variables not shown: arr_time (int), carrier (chr), tailnum (chr), flight
  (int), air_time (dbl), distance (dbl), hour (dbl), minute (dbl)
# flights that were delayed at departure had on time or early arrivals?
arrivals <- flights %>%
  mutate(arr_ok = ifelse(dep_delay > 0 & arr_delay <= 0, 1, 0)) %>% 
  select(origin, dest, ends_with("delay"), carrier, arr_ok)

# peek at it
arrivals %>%
  filter(arr_ok == 1) %>%
  head
Source: local data frame [6 x 6]

  origin  dest dep_delay arr_delay carrier arr_ok
   (chr) (chr)     (dbl)     (dbl)   (chr)  (dbl)
1    PDX   IAH        13        -4      UA      1
2    SEA   EWR         1        -6      UA      1
3    SEA   SAN         2       -12      AS      1
4    PDX   EWR         2       -19      UA      1
5    SEA   IAH        13        -4      UA      1
6    PDX   IAD        10        -4      UA      1

2.1.7 dplyr::summarise (or dplyr::summarize)

Collapses a dataframe into 1 row.

flights %>%
  summarise(mean(dep_delay, na.rm = TRUE))
Source: local data frame [1 x 1]

  mean(dep_delay, na.rm = TRUE)
                          (dbl)
1                      6.133859
# we can also name that variable, and summarise multiple variables
flights %>%
  summarise(mean_delay = mean(dep_delay, na.rm = TRUE),
            sd_delay = sd(dep_delay, na.rm = TRUE),
            median_delay = median(dep_delay, na.rm = TRUE))
Source: local data frame [1 x 3]

  mean_delay sd_delay median_delay
       (dbl)    (dbl)        (dbl)
1   6.133859 29.11204           -2

But this can get tedious with multiple summaries...

flights %>%
  filter(!is.na(dep_delay)) %>%
  select(dep_delay) %>%
  summarise_each(funs(mean, sd, median))
Source: local data frame [1 x 3]

      mean       sd median
     (dbl)    (dbl)  (dbl)
1 6.133859 29.11204     -2
# same thing
flights %>%
  filter(!is.na(dep_delay)) %>%
  summarise_each(funs(mean, sd, median), dep_delay)
Source: local data frame [1 x 3]

      mean       sd median
     (dbl)    (dbl)  (dbl)
1 6.133859 29.11204     -2
# combine with gather, change names too
flights %>%
  filter(!is.na(dep_delay)) %>%
  summarise_each(funs(mean, stdev = sd, median), dep_delay) %>%
  gather(delay_stat, value)
Source: local data frame [3 x 2]

  delay_stat     value
       (chr)     (dbl)
1       mean  6.133859
2      stdev 29.112035
3     median -2.000000

2.1.8 Aggregate functions in base R

Very useful combined with dplyr::summarise.

Function Description

min()

minimum value

max()

maximum value

mean()

mean value

sum()

sum of values

var()

variance

sd()

standard deviation

median()

median value

IQR()

interquartile range

One very important fact: in R, you can take the sum and mean of both numbers and logicals (remember typeof?). By default, a logical with a value of TRUE is a 1, and a FALSE is a zero. Quick aside to show you what this means:

vals <- c(1, 5, 5, 5, NA, 7, NA)
sum(vals)
[1] NA
sum(vals, na.rm = TRUE)
[1] 23
is.na(vals) 
[1] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE
is.na(vals) %>% as.integer
[1] 0 0 0 0 1 0 1
sum(is.na(vals))
[1] 2
vals == 5
[1] FALSE  TRUE  TRUE  TRUE    NA FALSE    NA
(vals == 5) %>% as.integer
[1]  0  1  1  1 NA  0 NA
sum(vals == 5, na.rm = TRUE)
[1] 3

Taking the mean of a boolean vector returns a proportion.

mean(vals) # actual mean
[1] NA
mean(vals, na.rm = TRUE) # actual mean
[1] 4.6
mean(is.na(vals)) # proportion missing
[1] 0.2857143
mean(vals == 5, na.rm = TRUE) # proportion of 5s
[1] 0.6

2.1.9 Aggregate functions in dplyr

Function Description

n()

number of values in vector

n_distinct()

number of distinct values in vector

first()

first value in vector

last()

last value in vector

nth()

nth value in vector

Let's see how this works with summarise

# how many unique destinations?
summary_table <- flights %>% 
  summarise(tot_flights = n(),
            tot_planes = n_distinct(tailnum),
            tot_carriers = n_distinct(carrier),
            tot_dests = n_distinct(dest),
            tot_origins = n_distinct(origin))

summary_table
Source: local data frame [1 x 5]

  tot_flights tot_planes tot_carriers tot_dests tot_origins
        (int)      (int)        (int)     (int)       (int)
1      162049       3023           11        71           2
# chain with tidyr functions
summary_table %>% 
  gather(key, value) %>% 
  separate(key, into = c("tot", "entity")) %>% 
  select(-tot, total = value)
Source: local data frame [5 x 2]

    entity  total
     (chr)  (int)
1  flights 162049
2   planes   3023
3 carriers     11
4    dests     71
5  origins      2

2.2 tidyr

We'll work with a made up dataframe:

df <- data.frame(
  id = 1:10,
  date = as.Date('2015-01-01') + 0:9,
  q1_m1_w1 = rnorm(10, 0, 1),
  q1_m1_w2 = rnorm(10, 0, 1),
  q1_m2_w3 = rnorm(10, 0, 1),
  q2_m1_w1 = rnorm(10, 0, 1),
  q2_m2_w1 = rnorm(10, 0, 1),
  q2_m2_w2 = rnorm(10, 0, 1)
)
# HLO
head(df)
  id       date   q1_m1_w1   q1_m1_w2    q1_m2_w3    q2_m1_w1    q2_m2_w1
1  1 2015-01-01 -0.6345459  1.1822500 -1.48655792  1.59441999 -0.31588531
2  2 2015-01-02  1.7045810 -0.7826462 -1.29774614  0.79825505 -0.27955622
3  3 2015-01-03  0.3266713  0.4755565  1.81680783  0.31805142  0.12165836
4  4 2015-01-04 -2.7061799 -0.1657401 -0.80074130  0.11544395  0.07152752
5  5 2015-01-05 -0.9150028  1.1591777  0.07077055 -0.21279434 -2.04686473
6  6 2015-01-06  1.7184398  2.0473497 -0.31425598 -0.09162879  0.17163420
    q2_m2_w2
1  0.6398774
2 -1.1329257
3 -0.8780192
4  0.7658333
5  0.5379359
6 -0.2509163
glimpse(df)
Observations: 10
Variables: 8
$ id       (int) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ date     (date) 2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015...
$ q1_m1_w1 (dbl) -0.6345459, 1.7045810, 0.3266713, -2.7061799, -0.9150...
$ q1_m1_w2 (dbl) 1.1822500, -0.7826462, 0.4755565, -0.1657401, 1.15917...
$ q1_m2_w3 (dbl) -1.48655792, -1.29774614, 1.81680783, -0.80074130, 0....
$ q2_m1_w1 (dbl) 1.59441999, 0.79825505, 0.31805142, 0.11544395, -0.21...
$ q2_m2_w1 (dbl) -0.31588531, -0.27955622, 0.12165836, 0.07152752, -2....
$ q2_m2_w2 (dbl) 0.6398774, -1.1329257, -0.8780192, 0.7658333, 0.53793...

2.2.1 tidyr::gather

First, let's gather...

df_tidy <- df %>%
  gather(key, value, q1_m1_w1:q2_m2_w2)
head(df_tidy)
  id       date      key      value
1  1 2015-01-01 q1_m1_w1 -0.6345459
2  2 2015-01-02 q1_m1_w1  1.7045810
3  3 2015-01-03 q1_m1_w1  0.3266713
4  4 2015-01-04 q1_m1_w1 -2.7061799
5  5 2015-01-05 q1_m1_w1 -0.9150028
6  6 2015-01-06 q1_m1_w1  1.7184398

Now let's gather using subtraction...

df_tidy <- df %>%
  gather(key, value, -id, -date)
head(df_tidy)
  id       date      key      value
1  1 2015-01-01 q1_m1_w1 -0.6345459
2  2 2015-01-02 q1_m1_w1  1.7045810
3  3 2015-01-03 q1_m1_w1  0.3266713
4  4 2015-01-04 q1_m1_w1 -2.7061799
5  5 2015-01-05 q1_m1_w1 -0.9150028
6  6 2015-01-06 q1_m1_w1  1.7184398

2.2.2 tidyr::separate

# separate 1 col into 3 cols
df_sep <- df_tidy %>%
  separate(key, into = c("quarter", "month", "week"))
head(df_sep)
  id       date quarter month week      value
1  1 2015-01-01      q1    m1   w1 -0.6345459
2  2 2015-01-02      q1    m1   w1  1.7045810
3  3 2015-01-03      q1    m1   w1  0.3266713
4  4 2015-01-04      q1    m1   w1 -2.7061799
5  5 2015-01-05      q1    m1   w1 -0.9150028
6  6 2015-01-06      q1    m1   w1  1.7184398
# separate 1 col into 2 cols
df_sep2 <- df_tidy %>%
  separate(key, into = c("quarter", "period"), extra = "merge")
head(df_sep2)
  id       date quarter period      value
1  1 2015-01-01      q1  m1_w1 -0.6345459
2  2 2015-01-02      q1  m1_w1  1.7045810
3  3 2015-01-03      q1  m1_w1  0.3266713
4  4 2015-01-04      q1  m1_w1 -2.7061799
5  5 2015-01-05      q1  m1_w1 -0.9150028
6  6 2015-01-06      q1  m1_w1  1.7184398

stringr vs. tidyr separate by regular expression

2.2.3 tidyr::extract

Extract is essentially the same as separate, let's see how...

# extract
df_ext <- df_sep2 %>%
  extract(period, into = "month")
head(df_ext)
  id       date quarter month      value
1  1 2015-01-01      q1    m1 -0.6345459
2  2 2015-01-02      q1    m1  1.7045810
3  3 2015-01-03      q1    m1  0.3266713
4  4 2015-01-04      q1    m1 -2.7061799
5  5 2015-01-05      q1    m1 -0.9150028
6  6 2015-01-06      q1    m1  1.7184398
# this gives us same output as separate
df_ext <- df_sep2 %>%
  extract(period, into = c("month", "week"), 
          regex = "([[:alnum:]]+)_([[:alnum:]]+)")
head(df_ext)
  id       date quarter month week      value
1  1 2015-01-01      q1    m1   w1 -0.6345459
2  2 2015-01-02      q1    m1   w1  1.7045810
3  3 2015-01-03      q1    m1   w1  0.3266713
4  4 2015-01-04      q1    m1   w1 -2.7061799
5  5 2015-01-05      q1    m1   w1 -0.9150028
6  6 2015-01-06      q1    m1   w1  1.7184398

2.2.4 tidyr::unite

# let's say we want to combine quarter and month with an underscore
df_uni <- df_sep %>%
  unite(period, quarter:month) # sep = "_" is the default arg
head(df_uni)
  id       date period week      value
1  1 2015-01-01  q1_m1   w1 -0.6345459
2  2 2015-01-02  q1_m1   w1  1.7045810
3  3 2015-01-03  q1_m1   w1  0.3266713
4  4 2015-01-04  q1_m1   w1 -2.7061799
5  5 2015-01-05  q1_m1   w1 -0.9150028
6  6 2015-01-06  q1_m1   w1  1.7184398
# let's say we want to combine quarter and month with nothing
df_uni <- df_sep %>%
  unite(period, quarter:month, sep = "")
head(df_uni)
  id       date period week      value
1  1 2015-01-01   q1m1   w1 -0.6345459
2  2 2015-01-02   q1m1   w1  1.7045810
3  3 2015-01-03   q1m1   w1  0.3266713
4  4 2015-01-04   q1m1   w1 -2.7061799
5  5 2015-01-05   q1m1   w1 -0.9150028
6  6 2015-01-06   q1m1   w1  1.7184398

2.2.5 tidyr::spread

# finally let's spread
df_spread <- df_uni %>%
  spread(week, value) # fill = NA is default arg
head(df_spread)
  id       date period         w1         w2        w3
1  1 2015-01-01   q1m1 -0.6345459  1.1822500        NA
2  1 2015-01-01   q1m2         NA         NA -1.486558
3  1 2015-01-01   q2m1  1.5944200         NA        NA
4  1 2015-01-01   q2m2 -0.3158853  0.6398774        NA
5  2 2015-01-02   q1m1  1.7045810 -0.7826462        NA
6  2 2015-01-02   q1m2         NA         NA -1.297746

2.2.6 Gather multiple sets of columns (gather() %>% separate() %>% spread())

Gather multiple sets of columns

All in one, if we had wanted to essentially "gather" three sets of columns (here, one for each week)...

df_tidiest <- df %>%
  gather(key, value, -id, -date) %>%
  separate(key, into = c("quarter", "month", "week")) %>%
  spread(week, value)
head(df_tidiest)
  id       date quarter month         w1         w2        w3
1  1 2015-01-01      q1    m1 -0.6345459  1.1822500        NA
2  1 2015-01-01      q1    m2         NA         NA -1.486558
3  1 2015-01-01      q2    m1  1.5944200         NA        NA
4  1 2015-01-01      q2    m2 -0.3158853  0.6398774        NA
5  2 2015-01-02      q1    m1  1.7045810 -0.7826462        NA
6  2 2015-01-02      q1    m2         NA         NA -1.297746

2.2.7 tidying challenge

Anscombe's data is available in the datasets package. This package comes preinstalled for you when you download R, so it is already installed and loaded for you. You can see all the datasets available to you by typing data() into your console.

data("anscombe") # load the dataframe

It is not tidy...

head(anscombe)
  x1 x2 x3 x4   y1   y2    y3   y4
1 10 10 10  8 8.04 9.14  7.46 6.58
2  8  8  8  8 6.95 8.14  6.77 5.76
3 13 13 13  8 7.58 8.74 12.74 7.71
4  9  9  9  8 8.81 8.77  7.11 8.84
5 11 11 11  8 8.33 9.26  7.81 8.47
6 14 14 14  8 9.96 8.10  8.84 7.04

We would like to be able to make a table like this:

observation set x y
1 I 10 8.04
2 I 8 6.95
3 I 13 7.58
4 I 9 8.81
5 I 11 8.33
6 I 14 9.96

Or make a plot like this using ggplot2:

In order to make these types of plots, we need to do some tidyr and dplyr legwork. Your challenge: tidy this dataset such that the column names are:

  • observation as an integer (1-11)
  • set as a roman numeral (I / II / III / IV)
  • x values
  • y values

2.2.7.1 Helpful Hints I

Break it down into manageable steps on paper first:

  1. We need to add a column that holds the observation number
  2. We will need to gather all columns BUT the one we just made in (1)
  3. We will then need to separate values in one column because it will contain our previous column names (x1, x2, x3, x4, y1, y2, y3, y4)
  4. We will want to mutate the values in another column from integers to roman numerals
  5. We will need to make sure that x values and y values are in separate columns

2.2.7.2 Helpful Hints II

Perhaps more helpful!

  1. Add a column for observation (dplyr::mutate) (hint: seq_along is a cool function)
  2. Gather columns into key-value pairs (tidyr::gather)
  3. Your key column needs to be separated (tidyr::separate)
  4. Mutate values in the set column from integers into roman numerals (dplyr::mutate) (hint!: as.roman is a cool function)
  5. At this point, your x's and y's are stacked in the same column; you need to spread the key-value pair across multiple columns so that x's are in one column and y's are in another (tidyr::spread)

2.2.7.3 My solution

Just one possible way of many ways to solve this

anscombe_tidy <- anscombe %>%
    mutate(observation = seq_along(x1)) %>%
    gather(key, value, -observation) %>%
    separate(key, into = c("variable", "set"), 1) %>%
    mutate(set = as.roman(set)) %>%
    spread(variable, value) %>%
    arrange(set)

2.3 broom

"The broom package takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy data frames." So, broom tidies output from other R functions that are un-tidy.

See here for list of functions: https://github.com/dgrtwo/broom

Vignette: ftp://cran.r-project.org/pub/R/web/packages/broom/vignettes/broom.html

fit <- lm(mpg ~ qsec + factor(am) + wt + factor(gear), 
          data = mtcars)

Un-tidy output from lm

summary(fit)

Call:
lm(formula = mpg ~ qsec + factor(am) + wt + factor(gear), data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5064 -1.5220 -0.7517  1.3841  4.6345 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     9.3650     8.3730   1.118  0.27359    
qsec            1.2449     0.3828   3.252  0.00317 ** 
factor(am)1     3.1505     1.9405   1.624  0.11654    
wt             -3.9263     0.7428  -5.286 1.58e-05 ***
factor(gear)4  -0.2682     1.6555  -0.162  0.87257    
factor(gear)5  -0.2697     2.0632  -0.131  0.89698    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.55 on 26 degrees of freedom
Multiple R-squared:  0.8498,    Adjusted R-squared:  0.8209 
F-statistic: 29.43 on 5 and 26 DF,  p-value: 6.379e-10

Tidy output from broom

tidy(fit)
           term   estimate std.error  statistic      p.value
1   (Intercept)  9.3650443 8.3730161  1.1184792 2.735903e-01
2          qsec  1.2449212 0.3828479  3.2517387 3.168128e-03
3   factor(am)1  3.1505178 1.9405171  1.6235455 1.165367e-01
4            wt -3.9263022 0.7427562 -5.2861251 1.581735e-05
5 factor(gear)4 -0.2681630 1.6554617 -0.1619868 8.725685e-01
6 factor(gear)5 -0.2697468 2.0631829 -0.1307430 8.969850e-01

3 Best for html output

3.1 The DT package

An excellent tutorial on DT is available at https://rstudio.github.io/DT/.

datatable(iris)

4 For pdf or html

4.1 The kable function in the knitr package

https://www.rdocumentation.org/packages/knitr/versions/1.12.3/topics/kable

kable(head(iris))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa

4.2 The xtable package (best for html)

The xtable is a solution that delivers both HTML and LaTeX. The syntax is very similar to kable:

output <- 
  matrix(sprintf("Content %s", LETTERS[1:4]),
         ncol=2, byrow=TRUE)
colnames(output) <- 
  c("1st header", "2nd header")
rownames(output) <- 
  c("1st row", "2nd row")

print(xtable(output, 
             caption="A test table", 
             align = c("l", "c", "r")), 
      type="html")
<!-- html table generated in R 3.2.3 by xtable 1.8-2 package -->
<!-- Thu Oct 27 16:09:49 2016 -->
<table border=1>
<caption align="bottom"> A test table </caption>
<tr> <th>  </th> <th> 1st header </th> <th> 2nd header </th>  </tr>
  <tr> <td> 1st row </td> <td align="center"> Content A </td> <td align="right"> Content B </td> </tr>
  <tr> <td> 2nd row </td> <td align="center"> Content C </td> <td align="right"> Content D </td> </tr>
   </table>

Note that to make it knit, you need to specify a chunk option: results = 'asis'

print(xtable(output, 
             caption="A test table", 
             align = c("l", "c", "r")), 
      type="html")
A test table
1st header 2nd header
1st row Content A Content B
2nd row Content C Content D
print(xtable(head(iris)), type = 'html', html.table.attributes = '')
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.10 3.50 1.40 0.20 setosa
2 4.90 3.00 1.40 0.20 setosa
3 4.70 3.20 1.30 0.20 setosa
4 4.60 3.10 1.50 0.20 setosa
5 5.00 3.60 1.40 0.20 setosa
6 5.40 3.90 1.70 0.40 setosa

4.3 The pixiedust package (best for PDF)

Remember that broom package we used earlier? We can make this table better...

tidy(fit)
           term   estimate std.error  statistic      p.value
1   (Intercept)  9.3650443 8.3730161  1.1184792 2.735903e-01
2          qsec  1.2449212 0.3828479  3.2517387 3.168128e-03
3   factor(am)1  3.1505178 1.9405171  1.6235455 1.165367e-01
4            wt -3.9263022 0.7427562 -5.2861251 1.581735e-05
5 factor(gear)4 -0.2681630 1.6554617 -0.1619868 8.725685e-01
6 factor(gear)5 -0.2697468 2.0631829 -0.1307430 8.969850e-01

https://cran.r-project.org/web/packages/pixiedust/vignettes/pixiedust.html

dust(fit) %>% 
  sprinkle(cols = "term", 
           replace = c("Intercept", "Quarter Mile Time", "Automatic vs. Manual",
                       "Weight", "Gears: 4 vs. 3", "Gears: 5 vs 3")) %>%
  sprinkle(cols = c("estimate", "std.error", "statistic"),
           round = 3) %>% 
  sprinkle(cols = "p.value", fn = quote(pvalString(value))) %>% 
  sprinkle_colnames("Term", "Coefficient", "SE", "T-statistic", "P-value")
Term Coefficient SE T-statistic P-value
Intercept 9.365 8.373 1.118 0.27
Quarter Mile Time 1.245 0.383 3.252 0.003
Automatic vs. Manual 3.151 1.941 1.624 0.12
Weight -3.926 0.743 -5.286 < 0.001
Gears: 4 vs. 3 -0.268 1.655 -0.162 0.87
Gears: 5 vs 3 -0.27 2.063 -0.131 0.9


5 Finally, fonts!

https://github.com/wch/extrafont

Follow all installation instructions from github