My thoughts on DuckDB and R with examples

R packages discussed in these notes include duckdb (of course), dplyr, data.table, fst, xts, RSQLite, and vroom and a little Python Pandas by way of reticulate.

The notes exhibit a mild disdain for SQL. For a much more comprehensive discussion on difficulties with SQL, see these really interesting notes by Jamie Brandon: https://scattered-thoughts.net/writing/against-sql/. As an alternative to SQL I generally prefer dplyr.

These notes present several interesting, if somewhat eclectic, data-sciency examples. For more comprehensive and straight-up database-style performance comparisons, see the excellent work by H20 here: https://h2oai.github.io/db-benchmark/ (where both R's data.table and DuckDB perform very well in general).

Also, you should check out https://github.com/pola-rs/polars for a remarkably high-performance new data frame implementation in Rust and geared to Python right now. This is the first data frame-like environment I have seen that really gives R's data.table competition, aside from KDB+ of course.

Slides

Main overview:

Overview

The easy pieces:

A SQL rant born out of frustration while compiling these notes appears here:

Declarative, Schmerative

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
asof		asof
group_by		group_by
last		last
ranges		ranges
sort		sort
talk		talk
taxi		taxi
tpch		tpch
README.md		README.md
thoughts_on_duckdb.html		thoughts_on_duckdb.html
thoughts_on_duckdb.rmd		thoughts_on_duckdb.rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My thoughts on DuckDB and R with examples

About

Releases

Packages

Languages

bwlewis/duckdb_and_r

Folders and files

Latest commit

History

Repository files navigation

My thoughts on DuckDB and R with examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages