Sometimes you may just want to type in a table in Markdown and ignore R. Four kinds of tables may be used. The first three kinds presuppose the use of a fixed-width font, such as Courier. The fourth kind can be used with proportionally spaced fonts, as it does not require lining up columns. All of the below will render when typed outside of an R code chunk since these are based on pandoc
being used to render your markdown document. Note that these should all work whether you are knitting to either html or PDF.
This code for a simple table:
Right Left Center Default
------- ------ ---------- -------
12 12 12 12
123 123 123 123
1 1 1 1
Table: Demonstration of simple table syntax.
Produces this simple table:
Right | Left | Center | Default |
---|---|---|---|
12 | 12 | 12 | 12 |
123 | 123 | 123 | 123 |
1 | 1 | 1 | 1 |
The headers and table rows must each fit on one line. Column alignments are determined by the position of the header text relative to the dashed line below it:3
The column headers may be omitted, provided a dashed line is used to end the table.
This code for a multi-line table:
-------------------------------------------------------------
Centered Default Right Left
Header Aligned Aligned Aligned
----------- ------- --------------- -------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
-------------------------------------------------------------
Table: Here's the caption. It, too, may span
multiple lines.
Produces this multi-line table:
Centered Header | Default Aligned | Right Aligned | Left Aligned |
---|---|---|---|
First | row | 12.0 | Example of a row that spans multiple lines. |
Second | row | 5.0 | Here's another one. Note the blank line between rows. |
This code for a grid table:
: Sample grid table.
+---------------+---------------+--------------------+
| Fruit | Price | Advantages |
+===============+===============+====================+
| Bananas | $1.34 | - built-in wrapper |
| | | - bright color |
+---------------+---------------+--------------------+
| Oranges | $2.10 | - cures scurvy |
| | | - tasty |
+---------------+---------------+--------------------+
Produces this grid table:
Fruit | Price | Advantages |
---|---|---|
Bananas |
$1.34 |
|
Oranges |
$2.10 |
|
Alignments are not supported, nor are cells that span multiple columns or rows.
This code for a pipe table:
| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
| 12 | 12 | 12 | 12 |
| 123 | 123 | 123 | 123 |
| 1 | 1 | 1 | 1 |
: Demonstration of pipe table syntax.
Produces this pipe table:
Right | Left | Default | Center |
---|---|---|---|
12 | 12 | 12 | 12 |
123 | 123 | 123 | 123 |
1 | 1 | 1 | 1 |
If you want to make tables that include R output (like output from functions like means, variances, or output from models), there are two steps:
This section covers (1), which easy in R. But, although there are some nice options for (2) within R Markdown via various packages, I am not dogmatic about doing everything in R Markdown, especially things like (2).
dplyr
We'll use the pnwflights14
package to practice our dplyr
skills. We need to download the package from github using devtools
.
# once per machine
install.packages("devtools")
devtools::install_github("ismayc/pnwflights14")
Now, we need to load the flights
dataset from the pnwflights14
package.
# once per work session
data("flights", package = "pnwflights14")
Brief HLO of the flights
data:
dim(flights)
[1] 162049 16
glimpse(flights)
Observations: 162,049
Variables: 16
$ year (int) 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014...
$ month (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ day (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ dep_time (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
$ arr_time (int) 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, ...
$ arr_delay (dbl) 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -1...
$ carrier (chr) "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA"...
$ tailnum (chr) "N508AS", "N195UW", "N37422", "N547UW", "N762AS", "N...
$ flight (int) 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 47...
$ origin (chr) "PDX", "SEA", "PDX", "PDX", "SEA", "SEA", "SEA", "PD...
$ dest (chr) "ANC", "CLT", "IAH", "CLT", "ANC", "DTW", "ORD", "IA...
$ air_time (dbl) 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 13...
$ distance (dbl) 1542, 2279, 1825, 2282, 1448, 1927, 1721, 1825, 1024...
$ hour (dbl) 0, 0, 0, 0, 0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6...
$ minute (dbl) 1, 4, 8, 28, 34, 37, 46, 26, 27, 36, 41, 49, 50, 57,...
names(flights)
[1] "year" "month" "day" "dep_time" "dep_delay"
[6] "arr_time" "arr_delay" "carrier" "tailnum" "flight"
[11] "origin" "dest" "air_time" "distance" "hour"
[16] "minute"
dplyr::select
Use select to specify which columns in a dataframe you'd like to keep by name. Heretofore, this was not possible in base R! In base R, this can only be achieved using numeric variable positions. But most of the time, you keep track of your variables by name (like carrier
) rather than position (the 8th column).
# keep these 2 cols
mini_flights <- flights %>%
select(carrier, flight)
glimpse(mini_flights)
Observations: 162,049
Variables: 2
$ carrier (chr) "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA", ...
$ flight (int) 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 478,...
# keep first five cols
first_five <- flights %>%
select(year, month, day, dep_time, dep_delay)
glimpse(first_five)
Observations: 162,049
Variables: 5
$ year (int) 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014...
$ month (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ day (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ dep_time (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
# alternatively, specify range
first_five <- flights %>%
select(year:dep_delay)
glimpse(first_five)
Observations: 162,049
Variables: 5
$ year (int) 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014...
$ month (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ day (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ dep_time (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
We can also choose the columns we want by negation, that is, you can specify which columns to drop instead of keep. This way, all variables not listed are kept.
# we can also use negation
all_but_year <- flights %>%
select(-year)
glimpse(all_but_year)
Observations: 162,049
Variables: 15
$ month (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ day (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ dep_time (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
$ arr_time (int) 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, ...
$ arr_delay (dbl) 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -1...
$ carrier (chr) "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA"...
$ tailnum (chr) "N508AS", "N195UW", "N37422", "N547UW", "N762AS", "N...
$ flight (int) 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 47...
$ origin (chr) "PDX", "SEA", "PDX", "PDX", "SEA", "SEA", "SEA", "PD...
$ dest (chr) "ANC", "CLT", "IAH", "CLT", "ANC", "DTW", "ORD", "IA...
$ air_time (dbl) 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 13...
$ distance (dbl) 1542, 2279, 1825, 2282, 1448, 1927, 1721, 1825, 1024...
$ hour (dbl) 0, 0, 0, 0, 0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6...
$ minute (dbl) 1, 4, 8, 28, 34, 37, 46, 26, 27, 36, 41, 49, 50, 57,...
dplyr::select
comes with several other helper functions...
depart <- flights %>%
select(starts_with("dep_"))
glimpse(depart)
Observations: 162,049
Variables: 2
$ dep_time (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
times <- flights %>%
select(contains("time"))
glimpse(times)
Observations: 162,049
Variables: 3
$ dep_time (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 55...
$ arr_time (int) 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, 9...
$ air_time (dbl) 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 130...
# here I am not creating a new dataframe
flights %>%
select(-contains("time"))
Source: local data frame [162,049 x 13]
year month day dep_delay arr_delay carrier tailnum flight origin
(int) (int) (int) (dbl) (dbl) (chr) (chr) (int) (chr)
1 2014 1 1 96 70 AS N508AS 145 PDX
2 2014 1 1 -6 -23 US N195UW 1830 SEA
3 2014 1 1 13 -4 UA N37422 1609 PDX
4 2014 1 1 -2 -23 US N547UW 466 PDX
5 2014 1 1 44 43 AS N762AS 121 SEA
6 2014 1 1 82 88 DL N806DN 1823 SEA
7 2014 1 1 227 219 UA N14219 1481 SEA
8 2014 1 1 -4 15 UA N813UA 229 PDX
9 2014 1 1 7 24 UA N75433 1576 SEA
10 2014 1 1 1 -6 UA N574UA 478 SEA
.. ... ... ... ... ... ... ... ... ...
Variables not shown: dest (chr), distance (dbl), hour (dbl), minute (dbl)
delays <- flights %>%
select(ends_with("delay"))
glimpse(delays)
Observations: 162,049
Variables: 2
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
$ arr_delay (dbl) 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -1...
One of my favorite select helper functions is everything()
, which allows you to use select to keep all your variables, but easily rearrange the columns without having to list all the variables to keep/drop.
new_order <- flights %>%
select(origin, dest, everything())
head(new_order)
Source: local data frame [6 x 16]
origin dest year month day dep_time dep_delay arr_time arr_delay
(chr) (chr) (int) (int) (int) (int) (dbl) (int) (dbl)
1 PDX ANC 2014 1 1 1 96 235 70
2 SEA CLT 2014 1 1 4 -6 738 -23
3 PDX IAH 2014 1 1 8 13 548 -4
4 PDX CLT 2014 1 1 28 -2 800 -23
5 SEA ANC 2014 1 1 34 44 325 43
6 SEA DTW 2014 1 1 37 82 747 88
Variables not shown: carrier (chr), tailnum (chr), flight (int), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
# with negation
new_order2 <- flights %>%
select(origin, dest, everything(), -year)
head(new_order2)
Source: local data frame [6 x 15]
origin dest month day dep_time dep_delay arr_time arr_delay carrier
(chr) (chr) (int) (int) (int) (dbl) (int) (dbl) (chr)
1 PDX ANC 1 1 1 96 235 70 AS
2 SEA CLT 1 1 4 -6 738 -23 US
3 PDX IAH 1 1 8 13 548 -4 UA
4 PDX CLT 1 1 28 -2 800 -23 US
5 SEA ANC 1 1 34 44 325 43 AS
6 SEA DTW 1 1 37 82 747 88 DL
Variables not shown: tailnum (chr), flight (int), air_time (dbl), distance
(dbl), hour (dbl), minute (dbl)
We can also rename variables within select.
flights2 <- flights %>%
select(tail_num = tailnum, everything())
head(flights2)
Source: local data frame [6 x 16]
tail_num year month day dep_time dep_delay arr_time arr_delay carrier
(chr) (int) (int) (int) (int) (dbl) (int) (dbl) (chr)
1 N508AS 2014 1 1 1 96 235 70 AS
2 N195UW 2014 1 1 4 -6 738 -23 US
3 N37422 2014 1 1 8 13 548 -4 UA
4 N547UW 2014 1 1 28 -2 800 -23 US
5 N762AS 2014 1 1 34 44 325 43 AS
6 N806DN 2014 1 1 37 82 747 88 DL
Variables not shown: flight (int), origin (chr), dest (chr), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
If you don't want to move the renamed variables within your dataframe, you can use the rename
function.
flights3 <- flights %>%
rename(tail_num = tailnum)
glimpse(flights3)
Observations: 162,049
Variables: 16
$ year (int) 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014...
$ month (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ day (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ dep_time (int) 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 5...
$ dep_delay (dbl) 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3,...
$ arr_time (int) 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, ...
$ arr_delay (dbl) 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -1...
$ carrier (chr) "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA"...
$ tail_num (chr) "N508AS", "N195UW", "N37422", "N547UW", "N762AS", "N...
$ flight (int) 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 47...
$ origin (chr) "PDX", "SEA", "PDX", "PDX", "SEA", "SEA", "SEA", "PD...
$ dest (chr) "ANC", "CLT", "IAH", "CLT", "ANC", "DTW", "ORD", "IA...
$ air_time (dbl) 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 13...
$ distance (dbl) 1542, 2279, 1825, 2282, 1448, 1927, 1721, 1825, 1024...
$ hour (dbl) 0, 0, 0, 0, 0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6...
$ minute (dbl) 1, 4, 8, 28, 34, 37, 46, 26, 27, 36, 41, 49, 50, 57,...
dplyr::filter
# flights taking off from PDX
pdx <- flights %>%
filter(origin == "PDX")
head(pdx)
Source: local data frame [6 x 16]
year month day dep_time dep_delay arr_time arr_delay carrier tailnum
(int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr)
1 2014 1 1 1 96 235 70 AS N508AS
2 2014 1 1 8 13 548 -4 UA N37422
3 2014 1 1 28 -2 800 -23 US N547UW
4 2014 1 1 526 -4 1148 15 UA N813UA
5 2014 1 1 541 1 911 4 UA N36476
6 2014 1 1 549 24 907 12 US N548UW
Variables not shown: flight (int), origin (chr), dest (chr), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
# january flights from PDX
pdx_jan <- flights %>%
filter(origin == "PDX", month == 1) # the comma is an "and"
head(pdx_jan)
Source: local data frame [6 x 16]
year month day dep_time dep_delay arr_time arr_delay carrier tailnum
(int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr)
1 2014 1 1 1 96 235 70 AS N508AS
2 2014 1 1 8 13 548 -4 UA N37422
3 2014 1 1 28 -2 800 -23 US N547UW
4 2014 1 1 526 -4 1148 15 UA N813UA
5 2014 1 1 541 1 911 4 UA N36476
6 2014 1 1 549 24 907 12 US N548UW
Variables not shown: flight (int), origin (chr), dest (chr), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
# flights to ATL (Atlanta) or BNA (Nashville)
to_south <- flights %>%
filter(dest == "ATL" | dest == "BNA") %>% # | is "or"
select(origin, dest, everything())
head(to_south)
Source: local data frame [6 x 16]
origin dest year month day dep_time dep_delay arr_time arr_delay
(chr) (chr) (int) (int) (int) (int) (dbl) (int) (dbl)
1 SEA ATL 2014 1 1 624 -6 1401 -6
2 SEA ATL 2014 1 1 802 -3 1533 -17
3 SEA ATL 2014 1 1 824 -1 1546 -14
4 PDX ATL 2014 1 1 944 -6 1727 -8
5 PDX ATL 2014 1 1 1054 94 1807 84
6 SEA ATL 2014 1 1 1158 6 1915 -14
Variables not shown: carrier (chr), tailnum (chr), flight (int), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
# flights from PDX to ATL (Atlanta) or BNA (Nashville)
pdx_to_south <- flights %>%
filter(origin == "PDX", dest == "ATL" | dest == "BNA") %>% # | is "or"
select(origin, dest, everything())
head(pdx_to_south)
Source: local data frame [6 x 16]
origin dest year month day dep_time dep_delay arr_time arr_delay
(chr) (chr) (int) (int) (int) (int) (dbl) (int) (dbl)
1 PDX ATL 2014 1 1 944 -6 1727 -8
2 PDX ATL 2014 1 1 1054 94 1807 84
3 PDX ATL 2014 1 1 1323 -2 2038 -15
4 PDX ATL 2014 1 1 2253 8 611 4
5 PDX ATL 2014 1 2 627 -3 1350 -7
6 PDX ATL 2014 1 2 918 -2 1643 -2
Variables not shown: carrier (chr), tailnum (chr), flight (int), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
# alternatively, using group membership
south_dests <- c("ATL", "BNA")
pdx_to_south2 <- flights %>%
filter(origin == "PDX", dest %in% south_dests) %>%
select(origin, dest, everything())
head(pdx_to_south2)
Source: local data frame [6 x 16]
origin dest year month day dep_time dep_delay arr_time arr_delay
(chr) (chr) (int) (int) (int) (int) (dbl) (int) (dbl)
1 PDX ATL 2014 1 1 944 -6 1727 -8
2 PDX ATL 2014 1 1 1054 94 1807 84
3 PDX ATL 2014 1 1 1323 -2 2038 -15
4 PDX ATL 2014 1 1 2253 8 611 4
5 PDX ATL 2014 1 2 627 -3 1350 -7
6 PDX ATL 2014 1 2 918 -2 1643 -2
Variables not shown: carrier (chr), tailnum (chr), flight (int), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
# flights delayed by 1 hour or more
delay_1plus <- flights %>%
filter(dep_delay >= 60)
head(delay_1plus)
Source: local data frame [6 x 16]
year month day dep_time dep_delay arr_time arr_delay carrier tailnum
(int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr)
1 2014 1 1 1 96 235 70 AS N508AS
2 2014 1 1 37 82 747 88 DL N806DN
3 2014 1 1 346 227 936 219 UA N14219
4 2014 1 1 650 90 1037 91 US N626AW
5 2014 1 1 959 164 1137 157 AS N534AS
6 2014 1 1 1008 68 1242 64 AS N788AS
Variables not shown: flight (int), origin (chr), dest (chr), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
# flights delayed by 1 hour, but not more than 2 hours
delay_1hr <- flights %>%
filter(dep_delay >= 60, dep_delay < 120)
head(delay_1hr)
Source: local data frame [6 x 16]
year month day dep_time dep_delay arr_time arr_delay carrier tailnum
(int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr)
1 2014 1 1 1 96 235 70 AS N508AS
2 2014 1 1 37 82 747 88 DL N806DN
3 2014 1 1 650 90 1037 91 US N626AW
4 2014 1 1 1008 68 1242 64 AS N788AS
5 2014 1 1 1014 75 1613 81 UA N37408
6 2014 1 1 1036 81 1408 63 OO N218AG
Variables not shown: flight (int), origin (chr), dest (chr), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
range(delay_1hr$dep_delay, na.rm = TRUE)
[1] 60 119
# even more efficient using between (always inclusive)
delay_bwn <- flights %>%
filter(between(dep_delay, 60, 119))
head(delay_bwn)
Source: local data frame [6 x 16]
year month day dep_time dep_delay arr_time arr_delay carrier tailnum
(int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr)
1 2014 1 1 1 96 235 70 AS N508AS
2 2014 1 1 37 82 747 88 DL N806DN
3 2014 1 1 650 90 1037 91 US N626AW
4 2014 1 1 1008 68 1242 64 AS N788AS
5 2014 1 1 1014 75 1613 81 UA N37408
6 2014 1 1 1036 81 1408 63 OO N218AG
Variables not shown: flight (int), origin (chr), dest (chr), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
range(delay_bwn$dep_delay, na.rm = TRUE)
[1] 60 119
Very useful when combined with dplyr::filter
?Comparison
Operator | Description |
---|---|
|
less than |
|
less than or equal to |
|
greater than |
|
greater than or equal to |
|
exactly equal to |
|
not equal to |
|
group membership |
|
is NA |
|
is not NA |
?base::Logic
Operator | Description |
---|---|
|
x AND y (logical and) |
|
x OR y (logical or) |
|
exactly x or y |
|
not x (logical negation) |
|
any true |
|
all true |
|
test if X is TRUE |
Logical or (|
) is inclusive, so x | y
really means:
Exclusive or (xor
) is exclusive, so xor(x, y)
really means:
x <- c(0, 1, 0, 1)
y <- c(0, 0, 1, 1)
boolean_or <- x | y
exclusive_or <- xor(x, y)
cbind(x, y, boolean_or, exclusive_or)
x y boolean_or exclusive_or
[1,] 0 0 0 0
[2,] 1 0 1 1
[3,] 0 1 1 1
[4,] 1 1 1 0
dplyr::arrange
# default is ascending order
flights %>%
arrange(year, month, day)
Source: local data frame [162,049 x 16]
year month day dep_time dep_delay arr_time arr_delay carrier tailnum
(int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr)
1 2014 1 1 1 96 235 70 AS N508AS
2 2014 1 1 4 -6 738 -23 US N195UW
3 2014 1 1 8 13 548 -4 UA N37422
4 2014 1 1 28 -2 800 -23 US N547UW
5 2014 1 1 34 44 325 43 AS N762AS
6 2014 1 1 37 82 747 88 DL N806DN
7 2014 1 1 346 227 936 219 UA N14219
8 2014 1 1 526 -4 1148 15 UA N813UA
9 2014 1 1 527 7 917 24 UA N75433
10 2014 1 1 536 1 1334 -6 UA N574UA
.. ... ... ... ... ... ... ... ... ...
Variables not shown: flight (int), origin (chr), dest (chr), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
# descending order
flights %>%
arrange(desc(year), desc(month), desc(day))
Source: local data frame [162,049 x 16]
year month day dep_time dep_delay arr_time arr_delay carrier tailnum
(int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr)
1 2014 12 31 2 12 601 31 AA N3JKAA
2 2014 12 31 27 -3 623 3 AA N3EWAA
3 2014 12 31 39 14 324 4 AS N762AS
4 2014 12 31 40 0 549 0 DL N757AT
5 2014 12 31 52 -8 917 -21 AA N3JFAA
6 2014 12 31 54 4 621 17 DL N128DL
7 2014 12 31 56 61 848 80 DL N655DL
8 2014 12 31 512 -3 904 4 US N653AW
9 2014 12 31 515 -5 855 5 US N580UW
10 2014 12 31 534 4 859 7 UA N34460
.. ... ... ... ... ... ... ... ... ...
Variables not shown: flight (int), origin (chr), dest (chr), air_time
(dbl), distance (dbl), hour (dbl), minute (dbl)
dplyr::distinct
Note: we are going to start chaining multiple pipe operators together now. You can chain all tidyr
and dplyr
functions together!
# all unique origin-dest combinations
flights %>%
select(origin, dest) %>%
distinct
Source: local data frame [115 x 2]
origin dest
(chr) (chr)
1 PDX ANC
2 SEA CLT
3 PDX IAH
4 PDX CLT
5 SEA ANC
6 SEA DTW
7 SEA ORD
8 SEA DEN
9 SEA EWR
10 PDX DEN
.. ... ...
# all unique destinations from PDX (there are 49)
from_pdx <- flights %>%
filter(origin == "PDX") %>%
select(origin, dest) %>%
distinct(dest)
head(from_pdx)
Source: local data frame [6 x 2]
origin dest
(chr) (chr)
1 PDX ANC
2 PDX IAH
3 PDX CLT
4 PDX DEN
5 PDX PHX
6 PDX ORD
dplyr::mutate
# add total delay variable
flights %>%
mutate(tot_delay = dep_delay + arr_delay) %>%
select(origin, dest, ends_with("delay"), everything())
Source: local data frame [162,049 x 17]
origin dest dep_delay arr_delay tot_delay year month day dep_time
(chr) (chr) (dbl) (dbl) (dbl) (int) (int) (int) (int)
1 PDX ANC 96 70 166 2014 1 1 1
2 SEA CLT -6 -23 -29 2014 1 1 4
3 PDX IAH 13 -4 9 2014 1 1 8
4 PDX CLT -2 -23 -25 2014 1 1 28
5 SEA ANC 44 43 87 2014 1 1 34
6 SEA DTW 82 88 170 2014 1 1 37
7 SEA ORD 227 219 446 2014 1 1 346
8 PDX IAH -4 15 11 2014 1 1 526
9 SEA DEN 7 24 31 2014 1 1 527
10 SEA EWR 1 -6 -5 2014 1 1 536
.. ... ... ... ... ... ... ... ... ...
Variables not shown: arr_time (int), carrier (chr), tailnum (chr), flight
(int), air_time (dbl), distance (dbl), hour (dbl), minute (dbl)
# flights that were delayed at departure had on time or early arrivals?
arrivals <- flights %>%
mutate(arr_ok = ifelse(dep_delay > 0 & arr_delay <= 0, 1, 0)) %>%
select(origin, dest, ends_with("delay"), carrier, arr_ok)
# peek at it
arrivals %>%
filter(arr_ok == 1) %>%
head
Source: local data frame [6 x 6]
origin dest dep_delay arr_delay carrier arr_ok
(chr) (chr) (dbl) (dbl) (chr) (dbl)
1 PDX IAH 13 -4 UA 1
2 SEA EWR 1 -6 UA 1
3 SEA SAN 2 -12 AS 1
4 PDX EWR 2 -19 UA 1
5 SEA IAH 13 -4 UA 1
6 PDX IAD 10 -4 UA 1
dplyr::summarise
(or dplyr::summarize
)Collapses a dataframe into 1 row.
flights %>%
summarise(mean(dep_delay, na.rm = TRUE))
Source: local data frame [1 x 1]
mean(dep_delay, na.rm = TRUE)
(dbl)
1 6.133859
# we can also name that variable, and summarise multiple variables
flights %>%
summarise(mean_delay = mean(dep_delay, na.rm = TRUE),
sd_delay = sd(dep_delay, na.rm = TRUE),
median_delay = median(dep_delay, na.rm = TRUE))
Source: local data frame [1 x 3]
mean_delay sd_delay median_delay
(dbl) (dbl) (dbl)
1 6.133859 29.11204 -2
But this can get tedious with multiple summaries...
flights %>%
filter(!is.na(dep_delay)) %>%
select(dep_delay) %>%
summarise_each(funs(mean, sd, median))
Source: local data frame [1 x 3]
mean sd median
(dbl) (dbl) (dbl)
1 6.133859 29.11204 -2
# same thing
flights %>%
filter(!is.na(dep_delay)) %>%
summarise_each(funs(mean, sd, median), dep_delay)
Source: local data frame [1 x 3]
mean sd median
(dbl) (dbl) (dbl)
1 6.133859 29.11204 -2
# combine with gather, change names too
flights %>%
filter(!is.na(dep_delay)) %>%
summarise_each(funs(mean, stdev = sd, median), dep_delay) %>%
gather(delay_stat, value)
Source: local data frame [3 x 2]
delay_stat value
(chr) (dbl)
1 mean 6.133859
2 stdev 29.112035
3 median -2.000000
Very useful combined with dplyr::summarise
.
Function | Description |
---|---|
|
minimum value |
|
maximum value |
|
mean value |
|
sum of values |
|
variance |
|
standard deviation |
|
median value |
|
interquartile range |
One very important fact: in R, you can take the sum
and mean
of both numbers and logicals (remember typeof
?). By default, a logical with a value of TRUE
is a 1, and a FALSE
is a zero. Quick aside to show you what this means:
vals <- c(1, 5, 5, 5, NA, 7, NA)
sum(vals)
[1] NA
sum(vals, na.rm = TRUE)
[1] 23
is.na(vals)
[1] FALSE FALSE FALSE FALSE TRUE FALSE TRUE
is.na(vals) %>% as.integer
[1] 0 0 0 0 1 0 1
sum(is.na(vals))
[1] 2
vals == 5
[1] FALSE TRUE TRUE TRUE NA FALSE NA
(vals == 5) %>% as.integer
[1] 0 1 1 1 NA 0 NA
sum(vals == 5, na.rm = TRUE)
[1] 3
Taking the mean
of a boolean vector returns a proportion.
mean(vals) # actual mean
[1] NA
mean(vals, na.rm = TRUE) # actual mean
[1] 4.6
mean(is.na(vals)) # proportion missing
[1] 0.2857143
mean(vals == 5, na.rm = TRUE) # proportion of 5s
[1] 0.6
dplyr
Function | Description |
---|---|
|
number of values in vector |
|
number of distinct values in vector |
|
first value in vector |
|
last value in vector |
|
nth value in vector |
Let's see how this works with summarise
# how many unique destinations?
summary_table <- flights %>%
summarise(tot_flights = n(),
tot_planes = n_distinct(tailnum),
tot_carriers = n_distinct(carrier),
tot_dests = n_distinct(dest),
tot_origins = n_distinct(origin))
summary_table
Source: local data frame [1 x 5]
tot_flights tot_planes tot_carriers tot_dests tot_origins
(int) (int) (int) (int) (int)
1 162049 3023 11 71 2
# chain with tidyr functions
summary_table %>%
gather(key, value) %>%
separate(key, into = c("tot", "entity")) %>%
select(-tot, total = value)
Source: local data frame [5 x 2]
entity total
(chr) (int)
1 flights 162049
2 planes 3023
3 carriers 11
4 dests 71
5 origins 2
tidyr
We'll work with a made up dataframe:
df <- data.frame(
id = 1:10,
date = as.Date('2015-01-01') + 0:9,
q1_m1_w1 = rnorm(10, 0, 1),
q1_m1_w2 = rnorm(10, 0, 1),
q1_m2_w3 = rnorm(10, 0, 1),
q2_m1_w1 = rnorm(10, 0, 1),
q2_m2_w1 = rnorm(10, 0, 1),
q2_m2_w2 = rnorm(10, 0, 1)
)
# HLO
head(df)
id date q1_m1_w1 q1_m1_w2 q1_m2_w3 q2_m1_w1 q2_m2_w1
1 1 2015-01-01 -0.6345459 1.1822500 -1.48655792 1.59441999 -0.31588531
2 2 2015-01-02 1.7045810 -0.7826462 -1.29774614 0.79825505 -0.27955622
3 3 2015-01-03 0.3266713 0.4755565 1.81680783 0.31805142 0.12165836
4 4 2015-01-04 -2.7061799 -0.1657401 -0.80074130 0.11544395 0.07152752
5 5 2015-01-05 -0.9150028 1.1591777 0.07077055 -0.21279434 -2.04686473
6 6 2015-01-06 1.7184398 2.0473497 -0.31425598 -0.09162879 0.17163420
q2_m2_w2
1 0.6398774
2 -1.1329257
3 -0.8780192
4 0.7658333
5 0.5379359
6 -0.2509163
glimpse(df)
Observations: 10
Variables: 8
$ id (int) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ date (date) 2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015...
$ q1_m1_w1 (dbl) -0.6345459, 1.7045810, 0.3266713, -2.7061799, -0.9150...
$ q1_m1_w2 (dbl) 1.1822500, -0.7826462, 0.4755565, -0.1657401, 1.15917...
$ q1_m2_w3 (dbl) -1.48655792, -1.29774614, 1.81680783, -0.80074130, 0....
$ q2_m1_w1 (dbl) 1.59441999, 0.79825505, 0.31805142, 0.11544395, -0.21...
$ q2_m2_w1 (dbl) -0.31588531, -0.27955622, 0.12165836, 0.07152752, -2....
$ q2_m2_w2 (dbl) 0.6398774, -1.1329257, -0.8780192, 0.7658333, 0.53793...
tidyr::gather
First, let's gather...
df_tidy <- df %>%
gather(key, value, q1_m1_w1:q2_m2_w2)
head(df_tidy)
id date key value
1 1 2015-01-01 q1_m1_w1 -0.6345459
2 2 2015-01-02 q1_m1_w1 1.7045810
3 3 2015-01-03 q1_m1_w1 0.3266713
4 4 2015-01-04 q1_m1_w1 -2.7061799
5 5 2015-01-05 q1_m1_w1 -0.9150028
6 6 2015-01-06 q1_m1_w1 1.7184398
Now let's gather using subtraction...
df_tidy <- df %>%
gather(key, value, -id, -date)
head(df_tidy)
id date key value
1 1 2015-01-01 q1_m1_w1 -0.6345459
2 2 2015-01-02 q1_m1_w1 1.7045810
3 3 2015-01-03 q1_m1_w1 0.3266713
4 4 2015-01-04 q1_m1_w1 -2.7061799
5 5 2015-01-05 q1_m1_w1 -0.9150028
6 6 2015-01-06 q1_m1_w1 1.7184398
tidyr::separate
# separate 1 col into 3 cols
df_sep <- df_tidy %>%
separate(key, into = c("quarter", "month", "week"))
head(df_sep)
id date quarter month week value
1 1 2015-01-01 q1 m1 w1 -0.6345459
2 2 2015-01-02 q1 m1 w1 1.7045810
3 3 2015-01-03 q1 m1 w1 0.3266713
4 4 2015-01-04 q1 m1 w1 -2.7061799
5 5 2015-01-05 q1 m1 w1 -0.9150028
6 6 2015-01-06 q1 m1 w1 1.7184398
# separate 1 col into 2 cols
df_sep2 <- df_tidy %>%
separate(key, into = c("quarter", "period"), extra = "merge")
head(df_sep2)
id date quarter period value
1 1 2015-01-01 q1 m1_w1 -0.6345459
2 2 2015-01-02 q1 m1_w1 1.7045810
3 3 2015-01-03 q1 m1_w1 0.3266713
4 4 2015-01-04 q1 m1_w1 -2.7061799
5 5 2015-01-05 q1 m1_w1 -0.9150028
6 6 2015-01-06 q1 m1_w1 1.7184398
stringr vs. tidyr separate by regular expression
tidyr::extract
Extract
is essentially the same as separate
, let's see how...
# extract
df_ext <- df_sep2 %>%
extract(period, into = "month")
head(df_ext)
id date quarter month value
1 1 2015-01-01 q1 m1 -0.6345459
2 2 2015-01-02 q1 m1 1.7045810
3 3 2015-01-03 q1 m1 0.3266713
4 4 2015-01-04 q1 m1 -2.7061799
5 5 2015-01-05 q1 m1 -0.9150028
6 6 2015-01-06 q1 m1 1.7184398
# this gives us same output as separate
df_ext <- df_sep2 %>%
extract(period, into = c("month", "week"),
regex = "([[:alnum:]]+)_([[:alnum:]]+)")
head(df_ext)
id date quarter month week value
1 1 2015-01-01 q1 m1 w1 -0.6345459
2 2 2015-01-02 q1 m1 w1 1.7045810
3 3 2015-01-03 q1 m1 w1 0.3266713
4 4 2015-01-04 q1 m1 w1 -2.7061799
5 5 2015-01-05 q1 m1 w1 -0.9150028
6 6 2015-01-06 q1 m1 w1 1.7184398
tidyr::unite
# let's say we want to combine quarter and month with an underscore
df_uni <- df_sep %>%
unite(period, quarter:month) # sep = "_" is the default arg
head(df_uni)
id date period week value
1 1 2015-01-01 q1_m1 w1 -0.6345459
2 2 2015-01-02 q1_m1 w1 1.7045810
3 3 2015-01-03 q1_m1 w1 0.3266713
4 4 2015-01-04 q1_m1 w1 -2.7061799
5 5 2015-01-05 q1_m1 w1 -0.9150028
6 6 2015-01-06 q1_m1 w1 1.7184398
# let's say we want to combine quarter and month with nothing
df_uni <- df_sep %>%
unite(period, quarter:month, sep = "")
head(df_uni)
id date period week value
1 1 2015-01-01 q1m1 w1 -0.6345459
2 2 2015-01-02 q1m1 w1 1.7045810
3 3 2015-01-03 q1m1 w1 0.3266713
4 4 2015-01-04 q1m1 w1 -2.7061799
5 5 2015-01-05 q1m1 w1 -0.9150028
6 6 2015-01-06 q1m1 w1 1.7184398
tidyr::spread
# finally let's spread
df_spread <- df_uni %>%
spread(week, value) # fill = NA is default arg
head(df_spread)
id date period w1 w2 w3
1 1 2015-01-01 q1m1 -0.6345459 1.1822500 NA
2 1 2015-01-01 q1m2 NA NA -1.486558
3 1 2015-01-01 q2m1 1.5944200 NA NA
4 1 2015-01-01 q2m2 -0.3158853 0.6398774 NA
5 2 2015-01-02 q1m1 1.7045810 -0.7826462 NA
6 2 2015-01-02 q1m2 NA NA -1.297746
gather() %>% separate() %>% spread()
)Gather multiple sets of columns
All in one, if we had wanted to essentially "gather" three sets of columns (here, one for each week)...
df_tidiest <- df %>%
gather(key, value, -id, -date) %>%
separate(key, into = c("quarter", "month", "week")) %>%
spread(week, value)
head(df_tidiest)
id date quarter month w1 w2 w3
1 1 2015-01-01 q1 m1 -0.6345459 1.1822500 NA
2 1 2015-01-01 q1 m2 NA NA -1.486558
3 1 2015-01-01 q2 m1 1.5944200 NA NA
4 1 2015-01-01 q2 m2 -0.3158853 0.6398774 NA
5 2 2015-01-02 q1 m1 1.7045810 -0.7826462 NA
6 2 2015-01-02 q1 m2 NA NA -1.297746
Anscombe's data is available in the datasets
package. This package comes preinstalled for you when you download R, so it is already installed and loaded for you. You can see all the datasets available to you by typing data()
into your console.
data("anscombe") # load the dataframe
It is not tidy...
head(anscombe)
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
We would like to be able to make a table like this:
observation | set | x | y |
---|---|---|---|
1 | I | 10 | 8.04 |
2 | I | 8 | 6.95 |
3 | I | 13 | 7.58 |
4 | I | 9 | 8.81 |
5 | I | 11 | 8.33 |
6 | I | 14 | 9.96 |
Or make a plot like this using ggplot2
:
In order to make these types of plots, we need to do some tidyr
and dplyr
legwork. Your challenge: tidy this dataset such that the column names are:
Break it down into manageable steps on paper first:
Perhaps more helpful!
observation
(dplyr::mutate
) (hint: seq_along
is a cool function)tidyr::gather
)tidyr::separate
)set
column from integers into roman numerals (dplyr::mutate
) (hint!: as.roman
is a cool function)tidyr::spread
)Just one possible way of many ways to solve this
anscombe_tidy <- anscombe %>%
mutate(observation = seq_along(x1)) %>%
gather(key, value, -observation) %>%
separate(key, into = c("variable", "set"), 1) %>%
mutate(set = as.roman(set)) %>%
spread(variable, value) %>%
arrange(set)
broom
"The broom package takes the messy output of built-in functions in R, such as lm
, nls
, or t.test
, and turns them into tidy data frames." So, broom tidies output from other R functions that are un-tidy.
See here for list of functions: https://github.com/dgrtwo/broom
Vignette: ftp://cran.r-project.org/pub/R/web/packages/broom/vignettes/broom.html
fit <- lm(mpg ~ qsec + factor(am) + wt + factor(gear),
data = mtcars)
Un-tidy output from lm
summary(fit)
Call:
lm(formula = mpg ~ qsec + factor(am) + wt + factor(gear), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.5064 -1.5220 -0.7517 1.3841 4.6345
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.3650 8.3730 1.118 0.27359
qsec 1.2449 0.3828 3.252 0.00317 **
factor(am)1 3.1505 1.9405 1.624 0.11654
wt -3.9263 0.7428 -5.286 1.58e-05 ***
factor(gear)4 -0.2682 1.6555 -0.162 0.87257
factor(gear)5 -0.2697 2.0632 -0.131 0.89698
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.55 on 26 degrees of freedom
Multiple R-squared: 0.8498, Adjusted R-squared: 0.8209
F-statistic: 29.43 on 5 and 26 DF, p-value: 6.379e-10
Tidy output from broom
tidy(fit)
term estimate std.error statistic p.value
1 (Intercept) 9.3650443 8.3730161 1.1184792 2.735903e-01
2 qsec 1.2449212 0.3828479 3.2517387 3.168128e-03
3 factor(am)1 3.1505178 1.9405171 1.6235455 1.165367e-01
4 wt -3.9263022 0.7427562 -5.2861251 1.581735e-05
5 factor(gear)4 -0.2681630 1.6554617 -0.1619868 8.725685e-01
6 factor(gear)5 -0.2697468 2.0631829 -0.1307430 8.969850e-01
DT
packageAn excellent tutorial on DT is available at https://rstudio.github.io/DT/.
datatable(iris)
kable
function in the knitr
packagehttps://www.rdocumentation.org/packages/knitr/versions/1.12.3/topics/kable
kable(head(iris))
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
xtable
package (best for html)The xtable is a solution that delivers both HTML and LaTeX. The syntax is very similar to kable:
output <-
matrix(sprintf("Content %s", LETTERS[1:4]),
ncol=2, byrow=TRUE)
colnames(output) <-
c("1st header", "2nd header")
rownames(output) <-
c("1st row", "2nd row")
print(xtable(output,
caption="A test table",
align = c("l", "c", "r")),
type="html")
<!-- html table generated in R 3.2.3 by xtable 1.8-2 package -->
<!-- Thu Oct 27 16:09:49 2016 -->
<table border=1>
<caption align="bottom"> A test table </caption>
<tr> <th> </th> <th> 1st header </th> <th> 2nd header </th> </tr>
<tr> <td> 1st row </td> <td align="center"> Content A </td> <td align="right"> Content B </td> </tr>
<tr> <td> 2nd row </td> <td align="center"> Content C </td> <td align="right"> Content D </td> </tr>
</table>
Note that to make it knit
, you need to specify a chunk option: results = 'asis'
print(xtable(output,
caption="A test table",
align = c("l", "c", "r")),
type="html")
1st header | 2nd header | |
---|---|---|
1st row | Content A | Content B |
2nd row | Content C | Content D |
print(xtable(head(iris)), type = 'html', html.table.attributes = '')
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
---|---|---|---|---|---|
1 | 5.10 | 3.50 | 1.40 | 0.20 | setosa |
2 | 4.90 | 3.00 | 1.40 | 0.20 | setosa |
3 | 4.70 | 3.20 | 1.30 | 0.20 | setosa |
4 | 4.60 | 3.10 | 1.50 | 0.20 | setosa |
5 | 5.00 | 3.60 | 1.40 | 0.20 | setosa |
6 | 5.40 | 3.90 | 1.70 | 0.40 | setosa |
pixiedust
package (best for PDF)Remember that broom
package we used earlier? We can make this table better...
tidy(fit)
term estimate std.error statistic p.value
1 (Intercept) 9.3650443 8.3730161 1.1184792 2.735903e-01
2 qsec 1.2449212 0.3828479 3.2517387 3.168128e-03
3 factor(am)1 3.1505178 1.9405171 1.6235455 1.165367e-01
4 wt -3.9263022 0.7427562 -5.2861251 1.581735e-05
5 factor(gear)4 -0.2681630 1.6554617 -0.1619868 8.725685e-01
6 factor(gear)5 -0.2697468 2.0631829 -0.1307430 8.969850e-01
https://cran.r-project.org/web/packages/pixiedust/vignettes/pixiedust.html
dust(fit) %>%
sprinkle(cols = "term",
replace = c("Intercept", "Quarter Mile Time", "Automatic vs. Manual",
"Weight", "Gears: 4 vs. 3", "Gears: 5 vs 3")) %>%
sprinkle(cols = c("estimate", "std.error", "statistic"),
round = 3) %>%
sprinkle(cols = "p.value", fn = quote(pvalString(value))) %>%
sprinkle_colnames("Term", "Coefficient", "SE", "T-statistic", "P-value")
Term | Coefficient | SE | T-statistic | P-value |
---|---|---|---|---|
Intercept | 9.365 | 8.373 | 1.118 | 0.27 |
Quarter Mile Time | 1.245 | 0.383 | 3.252 | 0.003 |
Automatic vs. Manual | 3.151 | 1.941 | 1.624 | 0.12 |
Weight | -3.926 | 0.743 | -5.286 | < 0.001 |
Gears: 4 vs. 3 | -0.268 | 1.655 | -0.162 | 0.87 |
Gears: 5 vs 3 | -0.27 | 2.063 | -0.131 | 0.9 |