Awesome resources for learning more about things relating to Apache Arrow, focussed on the R package arrow.
If you have any suggestions for other resources to add here, please submit a PR!
Key:
๐ฉโ๐ซ Workshop
๐ Blog post
๐ฝ๏ธ Video
๐๏ธ Slides
- "Larger-Than-Memory Data Workflows with Apache Arrow" - UseR! 2022 conference workshop ๐ฉโ๐ซ
- "Doing More with Data: An Introduction to Arrow for R Users" by Danielle Navarro ๐ฝ๏ธ
- "Getting started with Apache Arrow" by Danielle Navarro ๐
- "Efficient Data Analysis on Larger-than-Memory Data with DuckDB and Arrow" by Tom Mock ๐ฝ๏ธ
- "Bigger data with arrow and duckdb" by Tom Mock & Edgar Ruiz ๐๏ธ
- "New Directions for Apache Arrow" by Wes McKinney ๐ฝ๏ธ
- "Bigger Data With Ease Using Apache Arrow" by Neal Richardson ๐ฝ๏ธ
- "Apache Arrow: Enabling Data Engineering Tasks in R" by Ian Cook ๐ฝ๏ธ
- "Data serialisation in R" by Danielle Navarro ๐
- "Data types in Arrow and R" by Danielle Navarro ๐
- "Arrays and tables in Arrow" by Danielle Navarro ๐
- "Binding Apache Arrow to R" by Danielle Navarro ๐
- "Arrow New Feature Showcase: show_exec_plan()" by Nic Crane ๐
- "Creating an Arrow dataset: An exploration of the file formats that Arrow can read and write." by Franรงois Michonneau ๐
- "Creating an Arrow dataset (part 2): How does partitioning impact query performance?" by Franรงois Michonneau ๐
- "Understanding the Parquet file format" by Colin Gillespie ๐
- "Folks, Cโmon, Use Parquet" by Piotr Storoลผenko ๐