Row-oriented workflows in R with the tidyverse

Materials for RStudio webinar recording available at this link!:

Thinking inside the box: you can do that inside a data frame?!
Jenny Bryan
Wednesday, April 11 at 1:00pm ET / 10:00am PT
rstd.io/row-work <-- shortlink to this repo
Slides available on SpeakerDeck

Abstract

The data frame is a crucial data structure in R and, especially, in the tidyverse. Working on a column or a variable is a very natural operation, which is great. But what about row-oriented work? That also comes up frequently and is more awkward. In this webinar I’ll work through concrete code examples, exploring patterns that arise in data analysis. We’ll discuss the general notion of "split-apply-combine", row-wise work in a data frame, splitting vs. nesting, and list-columns.

Code examples

Beginner --> intermediate --> advanced
Not all are used in webinar

Leave your data in that big, beautiful data frame. ex01_leave-it-in-the-data-frame Show the evil of creating copies of certain rows of certain variables, using Magic Numbers and cryptic names, just to save some typing.
Adding or modifying variables. ex02_create-or-mutate-in-place df$var <- ... versus dplyr::mutate(). Recycling/safety, df's as data mask, aesthetics.
Are you SURE you need to iterate over rows? ex03_row-wise-iteration-are-you-sure Don't fixate on most obvious generalization of your pilot example and risk overlooking a vectorized solution. Features a paste() example, then goes out with some glue glory.
Working with non-vectorized functions. ex04_map-example Small example using purrr::map() to apply nrow() to list of data frames.
Row-wise thinking vs. column-wise thinking. ex05_attack-via-rows-or-columns Data rectangling example. Both are possible, but I find building a tibble column-by-column is less aggravating than building rows, then row binding.
Iterate over rows of a data frame. iterate-over-rows Empirical study of reshaping a data frame into this form: a list with one component per row. Revisiting a study originally done by Winston Chang. Run times for different number of rows or columns.
Generate data from different distributions via purrr::pmap(). ex06_runif-via-pmap Use purrr::pmap() to generate U[min, max] data for various combinations of (n, min, max), stored as rows of a data frame.
Are you SURE you need to iterate over groups? ex07_group-by-summarise Use dplyr::group_by() and dplyr::summarise() to compute group-wise summaries, without explicitly splitting up the data frame and re-combining the results. Use list() to package multivariate summaries into something summarise() can handle, creating a list-column.
Group-and-nest. ex08_nesting-is-good How to explicitly work on groups of rows via nesting (our recommendation) vs splitting.
Row-wise mean or sum. ex09_row-summaries How to do rowSums()-y and rowMeans()-y work inside a data frame.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
ex01_leave-it-in-the-data-frame_files/figure-gfm		ex01_leave-it-in-the-data-frame_files/figure-gfm
ex08_nesting-is-good_files/figure-gfm		ex08_nesting-is-good_files/figure-gfm
iterate-over-rows_files/figure-gfm		iterate-over-rows_files/figure-gfm
wch_files/figure-html		wch_files/figure-html
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
col-benchmark.csv		col-benchmark.csv
col-benchmark.png		col-benchmark.png
ex01_leave-it-in-the-data-frame.R		ex01_leave-it-in-the-data-frame.R
ex01_leave-it-in-the-data-frame.md		ex01_leave-it-in-the-data-frame.md
ex02_create-or-mutate-in-place.R		ex02_create-or-mutate-in-place.R
ex02_create-or-mutate-in-place.md		ex02_create-or-mutate-in-place.md
ex03_row-wise-iteration-are-you-sure.R		ex03_row-wise-iteration-are-you-sure.R
ex03_row-wise-iteration-are-you-sure.md		ex03_row-wise-iteration-are-you-sure.md
ex04_map-example.R		ex04_map-example.R
ex04_map-example.md		ex04_map-example.md
ex05_attack-via-rows-or-columns.R		ex05_attack-via-rows-or-columns.R
ex05_attack-via-rows-or-columns.md		ex05_attack-via-rows-or-columns.md
ex06_runif-via-pmap.R		ex06_runif-via-pmap.R
ex06_runif-via-pmap.md		ex06_runif-via-pmap.md
ex07_group-by-summarise.R		ex07_group-by-summarise.R
ex07_group-by-summarise.md		ex07_group-by-summarise.md
ex08_nesting-is-good.R		ex08_nesting-is-good.R
ex08_nesting-is-good.md		ex08_nesting-is-good.md
ex09_row-summaries.R		ex09_row-summaries.R
ex09_row-summaries.md		ex09_row-summaries.md
iterate-over-rows.R		iterate-over-rows.R
iterate-over-rows.md		iterate-over-rows.md
row-benchmark.csv		row-benchmark.csv
row-benchmark.png		row-benchmark.png
row-oriented-workflows.Rproj		row-oriented-workflows.Rproj
wch.Rmd		wch.Rmd
wch.md		wch.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Row-oriented workflows in R with the tidyverse

Abstract

Code examples

More tips and links

About

Releases

Packages

Contributors 2

Languages

License

jennybc/row-oriented-workflows

Folders and files

Latest commit

History

Repository files navigation

Row-oriented workflows in R with the tidyverse

Abstract

Code examples

More tips and links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages