- Intro
- Installation
- Pivot table
- Pivot table operations
- Define dimensions and measures
- Database, Spark and
data.table
connections - pivottabler
Create pivot tables with commonly used terms as commands such as:
pivot_rows()
, pivot_columns()
and pivot_values()
, and string them
together with a pipe (%>%
).
The idea is that the creation of a pivot table is done using code, as
opposed to drag-and-drop. This means that actions such as pivot_flip()
and pivot_drill()
are also possible, and performed using R commands.
Another goal of pivotable
is to provide a framework to easily define
prep_measures()
and prep_dimensions()
of your data. The resulting R
object can then be used as the source of the pivot table. This should
make it possible to create consistent analysis, and subsequent reporting
of the data.
# install.packages("remotes")
remotes::install_github("edgararuiz/pivotable")
The pivot_values()
function is used to add an aggregation in the pivot
table. When used against a data frame, you will have to provide an
aggregation formula of a field, or fields, within the data. If using a
pre-defined set of dimensions and measures, then simply call the desired
measure field, no need to re-aggregate. For more info see Define
dimensions and measures.
library(dplyr)
library(pivotable)
retail_orders %>%
pivot_values(sum(sales))
#> sum(sales)
#> 10032628.85
Multiple aggregations are supported by pivot_values()
retail_orders %>%
pivot_values(sum(sales), n())
#> sum(sales) n()
#> 10032628.85 307
The aggregations can also be named inside pivot_values()
retail_orders %>%
pivot_values(total_sales = sum(sales), no_sales = n())
#> total_sales no_sales
#> 10032628.85 307
As its name indicates, pivot_rows()
adds a data grouping based on the
variable or variables passed to the function. The aggregation is split
by the variable(s) and each total is displayed by row.
retail_orders %>%
pivot_rows(country) %>%
pivot_values(sum(sales))
#> sum(sales)
#> Australia 630623.1
#> Austria 202062.53
#> Belgium 108412.62
#> Canada 224078.56
#> Denmark 245637.15
#> Finland 329581.91
#> France 1110916.52
#> Germany 220472.09
#> Ireland 57756.43
#> Italy 374674.31
#> Japan 188167.81
#> Norway 307463.7
#> Philippines 94015.73
#> Singapore 288488.41
#> Spain 1215686.92
#> Sweden 210014.21
#> Switzerland 117713.56
#> UK 478880.46
#> USA 3627982.83
As its name indicates, pivot_rows()
adds a data grouping based on the
variable or variables passed to the function. The aggregation is split
by the variable(s) and each total is displayed by column.
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(sum(sales))
#> Cancelled Disputed In Process On Hold Resolved Shipped
#> Australia 14378.09 43971.43 572273.58
#> Austria 28550.59 173511.94
#> Belgium 8411.95 100000.67
#> Canada 224078.56
#> Denmark 26012.87 24078.61 195545.67
#> Finland 329581.91
#> France 43784.69 1067131.83
#> Germany 220472.09
#> Ireland 57756.43
#> Italy 374674.31
#> Japan 188167.81
#> Norway 307463.7
#> Philippines 94015.73
#> Singapore 288488.41
#> Spain 50010.65 31821.9 35133.34 53815.72 1044905.31
#> Sweden 48710.92 26260.21 135043.08
#> Switzerland 117713.56
#> UK 50408.25 428472.21
#> USA 45357.66 13428.55 152718.98 44273.36 3372204.28
Instead of “manually” switching the content of pivot_rows()
and
pivot_columns()
, specially during data exploration, simply pipe the
code to the pivot_flip()
command.
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(sum(sales)) %>%
pivot_flip()
#> Australia Austria Belgium Canada Denmark Finland France Germany Ireland Italy Japan Norway Philippines Singapore Spain Sweden Switzerland UK USA
#> Cancelled 50010.65 48710.92 50408.25 45357.66
#> Disputed 14378.09 26012.87 31821.9
#> In Process 43971.43 8411.95 43784.69 35133.34 13428.55
#> On Hold 26260.21 152718.98
#> Resolved 28550.59 24078.61 53815.72 44273.36
#> Shipped 572273.58 173511.94 100000.67 224078.56 195545.67 329581.91 1067131.83 220472.09 57756.43 374674.31 188167.81 307463.7 94015.73 288488.41 1044905.31 135043.08 117713.56 428472.21 3372204.28
pivotable
also includes support for the t()
method from base R. It
will perform the exact same operation as pivot_flip()
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(sum(sales)) %>%
t()
#> Australia Austria Belgium Canada Denmark Finland France Germany Ireland Italy Japan Norway Philippines Singapore Spain Sweden Switzerland UK USA
#> Cancelled 50010.65 48710.92 50408.25 45357.66
#> Disputed 14378.09 26012.87 31821.9
#> In Process 43971.43 8411.95 43784.69 35133.34 13428.55
#> On Hold 26260.21 152718.98
#> Resolved 28550.59 24078.61 53815.72 44273.36
#> Shipped 572273.58 173511.94 100000.67 224078.56 195545.67 329581.91 1067131.83 220472.09 57756.43 374674.31 188167.81 307463.7 94015.73 288488.41 1044905.31 135043.08 117713.56 428472.21 3372204.28
To limit the pivot table to display only a subset of the pivot table,
use pivot_focus()
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(total_sales = sum(sales)) %>%
pivot_focus(
country %in% c("Japan", "USA", "UK"),
status == "Shipped",
total_sales > 200000
)
#> Shipped
#> UK 428472.21
#> USA 3372204.28
pivotable
also supports dplyr
’s filter()
command.
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(total_sales = sum(sales)) %>%
filter(
country %in% c("Japan", "USA", "UK"),
status == "Shipped",
total_sales > 200000
)
#> Shipped
#> UK 428472.21
#> USA 3372204.28
Another powerful thing of pivot tables is the ability to drill down into
the data. To do this in pivotable
, you will need to define a hierarchy
dimension using the dim_hierarchy()
command. That command is made to
be called within one of the dimension definition functions, such as
pivot_rows()
or pivot_columns()
. The order of the hierarchy is
defined by the order in which the variables is passed to the function.
retail_orders %>%
pivot_rows(order_date = dim_hierarchy(
year = as.integer(format(orderdate, "%Y")),
month = as.integer(format(orderdate, "%m"))
)
) %>%
pivot_values(sum(sales))
#> sum(sales)
#> 2003 3516979.54
#> 2004 4724162.6
#> 2005 1791486.71
The pivot_drill()
command will add the next level of the hierarchy
dimension to the pivot table.
retail_orders %>%
pivot_rows(order_date = dim_hierarchy(
year = as.integer(format(orderdate, "%Y")),
month = as.integer(format(orderdate, "%m"))
)
) %>%
pivot_values(sum(sales)) %>%
pivot_drill(order_date)
#> sum(sales)
#> 2003 1 129753.6
#> 2 140836.19
#> 3 174504.9
#> 4 201609.55
#> 5 192673.11
#> 6 168082.56
#> 7 187731.88
#> 8 197809.3
#> 9 263973.36
#> 10 568290.97
#> 11 1029837.66
#> 12 261876.46
#> 2004 1 316577.42
#> 2 311419.53
#> 3 205733.73
#> 4 206148.12
#> 5 273438.39
#> 6 286674.22
#> 7 327144.09
#> 8 461501.27
#> 9 320750.91
#> 10 552924.25
#> 11 1089048.01
#> 12 372802.66
#> 2005 1 339543.42
#> 2 358186.18
#> 3 374262.76
#> 4 261633.29
#> 5 457861.06
A helper function called dim_hierarchy_mqy()
creates a three level
date hierarchy: year, quarter and month. The function will create the
formulas to calculate each level, but those formulas will not be
evaluated until the drilling into the level. The formulas are generic
enough to work on database back-ends.
retail_orders %>%
pivot_rows(order_date = dim_hierarchy_mqy(orderdate)) %>%
pivot_values(sum(sales)) %>%
pivot_drill(order_date)
#> sum(sales)
#> 2003 1 445094.69
#> 2 562365.22
#> 3 649514.54
#> 4 1860005.09
#> 2004 1 833730.68
#> 2 766260.73
#> 3 1109396.27
#> 4 2014774.92
#> 2005 1 1071992.36
#> 2 719494.35
The display of the totals can be controlled using pivot_totals()
. It
is possible to control the display of row and column totals
individually. Currently, sub-totals and grand totals are sum()
aggregates, so if the actual calculations are not record counts, or
another sum()
consider leaving them off. For example, if the
calculation is a mean()
, then the sub-total and grand total will be
the aggregate sum()
total of the averages, which would be incorrect.
retail_orders %>%
pivot_rows(status, country) %>%
pivot_values(sum(sales)) %>%
pivot_totals(
include_row_totals = TRUE
)
#> sum(sales)
#> Cancelled Spain 50010.65
#> Sweden 48710.92
#> UK 50408.25
#> USA 45357.66
#> Total 194487.48
#> Disputed Australia 14378.09
#> Denmark 26012.87
#> Spain 31821.9
#> Total 72212.86
#> In Process Australia 43971.43
#> Belgium 8411.95
#> France 43784.69
#> Spain 35133.34
#> USA 13428.55
#> Total 144729.96
#> On Hold Sweden 26260.21
#> USA 152718.98
#> Total 178979.19
#> Resolved Austria 28550.59
#> Denmark 24078.61
#> Spain 53815.72
#> USA 44273.36
#> Total 150718.28
#> Shipped Australia 572273.58
#> Austria 173511.94
#> Belgium 100000.67
#> Canada 224078.56
#> Denmark 195545.67
#> Finland 329581.91
#> France 1067131.83
#> Germany 220472.09
#> Ireland 57756.43
#> Italy 374674.31
#> Japan 188167.81
#> Norway 307463.7
#> Philippines 94015.73
#> Singapore 288488.41
#> Spain 1044905.31
#> Sweden 135043.08
#> Switzerland 117713.56
#> UK 428472.21
#> USA 3372204.28
#> Total 9291501.08
#> Total 10032628.85
A default can be set to control if the column or row totals are displayed. The default is set to not show.
pivot_default_totals(
include_column_totals = TRUE,
include_row_totals = TRUE
)
retail_orders %>%
pivot_rows(status, country) %>%
pivot_values(sum(sales))
#> sum(sales)
#> Cancelled Spain 50010.65
#> Sweden 48710.92
#> UK 50408.25
#> USA 45357.66
#> Total 194487.48
#> Disputed Australia 14378.09
#> Denmark 26012.87
#> Spain 31821.9
#> Total 72212.86
#> In Process Australia 43971.43
#> Belgium 8411.95
#> France 43784.69
#> Spain 35133.34
#> USA 13428.55
#> Total 144729.96
#> On Hold Sweden 26260.21
#> USA 152718.98
#> Total 178979.19
#> Resolved Austria 28550.59
#> Denmark 24078.61
#> Spain 53815.72
#> USA 44273.36
#> Total 150718.28
#> Shipped Australia 572273.58
#> Austria 173511.94
#> Belgium 100000.67
#> Canada 224078.56
#> Denmark 195545.67
#> Finland 329581.91
#> France 1067131.83
#> Germany 220472.09
#> Ireland 57756.43
#> Italy 374674.31
#> Japan 188167.81
#> Norway 307463.7
#> Philippines 94015.73
#> Singapore 288488.41
#> Spain 1044905.31
#> Sweden 135043.08
#> Switzerland 117713.56
#> UK 428472.21
#> USA 3372204.28
#> Total 9291501.08
#> Total 10032628.85
pivot_default_totals(FALSE, FALSE)
By themselves, pivot_rows()
and pivot_columns()
will only provide a
list of the unique values of the data frame. There is a way to setup a
default aggregation by using pivot_default_values()
pivot_default_values(n())
retail_orders %>%
pivot_rows(status)
#> n()
#> Cancelled 4
#> Disputed 3
#> In Process 6
#> On Hold 4
#> Resolved 4
#> Shipped 286
The as_tibble()
function is supported to convert the resulting pivot
table into a rectangular tibble()
retail_orders %>%
pivot_rows(country) %>%
pivot_columns(status) %>%
pivot_values(sum(sales)) %>%
as_tibble()
#> # A tibble: 19 x 7
#> row_name Cancelled Disputed `In Process` `On Hold` Resolved Shipped
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Australia NA 14378. 43971. NA NA 572274.
#> 2 Austria NA NA NA NA 28551. 173512.
#> 3 Belgium NA NA 8412. NA NA 100001.
#> 4 Canada NA NA NA NA NA 224079.
#> 5 Denmark NA 26013. NA NA 24079. 195546.
#> 6 Finland NA NA NA NA NA 329582.
#> 7 France NA NA 43785. NA NA 1067132.
#> 8 Germany NA NA NA NA NA 220472.
#> 9 Ireland NA NA NA NA NA 57756.
#> 10 Italy NA NA NA NA NA 374674.
#> 11 Japan NA NA NA NA NA 188168.
#> 12 Norway NA NA NA NA NA 307464.
#> 13 Philippines NA NA NA NA NA 94016.
#> 14 Singapore NA NA NA NA NA 288488.
#> 15 Spain 50011. 31822. 35133. NA 53816. 1044905.
#> 16 Sweden 48711. NA NA 26260. NA 135043.
#> 17 Switzerland NA NA NA NA NA 117714.
#> 18 UK 50408. NA NA NA NA 428472.
#> 19 USA 45358. NA 13429. 152719. 44273. 3372204.
With pivotable
, it is possible to pre-define a set of dimensions and
measures that can then be easily accesses and re-used by pivot tables.
The idea is to provide a way to centralize data definitions, which
creates a consistent reporting.
orders <- retail_orders %>%
prep_dimensions(
order_date = dim_hierarchy(
year = as.integer(format(orderdate, "%Y")),
month = as.integer(format(orderdate, "%m"))
),
status,
country
) %>%
prep_measures(
orders_qty = n(),
order_total = sum(sales),
sales_qty = sum(ifelse(status %in% c("In Process", "Shipped"), 1, 0)),
sales_total = sum(ifelse(status %in% c("In Process", "Shipped"), sales, 0))
)
orders
#> pivot_prep
#> Dimensions
#> **Name** - **Field / Calc**
#> order_date - dim_hierarchy(...)
#> status - status
#> country - country
#> Measures
#> **Name** - **Aggregate Calc**
#> orders_qty - n()
#> order_total - sum(sales)
#> sales_qty - sum(ifelse(status %in% c("In Process", "Shipped"), 1, 0))
#> sales_total - sum(ifelse(status %in% c("In Process", "Shipped"), sales, 0))
orders %>%
pivot_rows(status) %>%
pivot_columns(order_date) %>%
pivot_values(sales_total)
#> 2003 2004 2005
#> Cancelled 0 0
#> Disputed 0
#> In Process 144729.96
#> On Hold 0 0
#> Resolved 0 0 0
#> Shipped 3439718.03 4528047.22 1323735.83
orders %>%
pivot_rows(status) %>%
pivot_columns(order_date) %>%
pivot_values(sales_total) %>%
pivot_drill(order_date)
#> 2003 2004 2005
#> 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5
#> Cancelled 0 0 0
#> Disputed 0 0
#> In Process 144729.96
#> On Hold 0 0 0
#> Resolved 0 0 0 0
#> Shipped 129753.6 140836.19 174504.9 201609.55 192673.11 168082.56 187731.88 197809.3 263973.36 491029.46 1029837.66 261876.46 316577.42 311419.53 205733.73 206148.12 228080.73 186255.32 327144.09 461501.27 320750.91 552924.25 1038709.19 372802.66 295270.06 358186.18 320447.04 131218.33 218614.22
Because pivotable
uses dplyr
commands to create the aggregations.
This allows pivotable
to take advantage of the same integration that
dplyr
has, such as Spark, databases and data.table
.
library(DBI)
library(RSQLite)
con <- dbConnect(SQLite(), ":memory:")
tbl_sales <- copy_to(con, retail_orders)
tbl_sales %>%
pivot_columns(status) %>%
pivot_rows(country) %>%
pivot_values(sum(sales, na.rm = TRUE))
#> Cancelled Disputed In Process On Hold Resolved Shipped
#> Australia 14378.09 43971.43 572273.58
#> Austria 28550.59 173511.94
#> Belgium 8411.95 100000.67
#> Canada 224078.56
#> Denmark 26012.87 24078.61 195545.67
#> Finland 329581.91
#> France 43784.69 1067131.83
#> Germany 220472.09
#> Ireland 57756.43
#> Italy 374674.31
#> Japan 188167.81
#> Norway 307463.7
#> Philippines 94015.73
#> Singapore 288488.41
#> Spain 50010.65 31821.9 35133.34 53815.72 1044905.31
#> Sweden 48710.92 26260.21 135043.08
#> Switzerland 117713.56
#> UK 50408.25 428472.21
#> USA 45357.66 13428.55 152718.98 44273.36 3372204.28
It is also possible create a data definition against a database
connection. The prep_dimensions()
and prep_measures()
calculations
will not be send to the database until used in the pivot table.
orders_db <- tbl_sales %>%
prep_dimensions(
status,
country
) %>%
prep_measures(
orders_qty = n(),
order_total = sum(sales, na.rm = TRUE),
sales_qty = sum(ifelse(status %in% c("In Process", "Shipped"), 1, 0), na.rm = TRUE),
sales_total = sum(ifelse(status %in% c("In Process", "Shipped"), sales, 0), na.rm = TRUE)
)
orders_db %>%
pivot_columns(status) %>%
pivot_values(sales_total)
#> Cancelled Disputed In Process On Hold Resolved Shipped
#> 0 0 144729.96 0 0 9291501.08
dbDisconnect(con)
pivotable
uses pivottabler
to print the pivot table into the R
console. The to_pivottabler()
returns the actual pivottabler
object.
This allows you to further customize the pivot table using that
package’s API.
pt <- orders %>%
pivot_rows(order_date) %>%
pivot_values(orders_qty) %>%
to_pivottabler()
pt$asMatrix(repeatHeaders = TRUE, includeHeaders = TRUE)
#> [,1] [,2]
#> [1,] "" "orders_qty"
#> [2,] "2003" "104"
#> [3,] "2004" "144"
#> [4,] "2005" "59"