Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oct 21, 2024: This week in DataFusion #13035

Closed
2 of 4 tasks
alamb opened this issue Oct 21, 2024 · 7 comments
Closed
2 of 4 tasks

Oct 21, 2024: This week in DataFusion #13035

alamb opened this issue Oct 21, 2024 · 7 comments
Assignees

Comments

@alamb
Copy link
Contributor

alamb commented Oct 21, 2024

Introduction

This ticket is a weekly summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please feel free to leave comments on this ticket about things that I may have missed or you think should get wider attention by the community

Loosely inspired by https://this-week-in-rust.org/

Highlights from last week(s):

(I am sorry if I missed you -- please add a note to this ticket with anything you would like to add)

Looking to get more involved? Try code review!

DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.

We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try @ mentioning one of the committers.

Help wanted

Please feel leave your own comments on the ticket if you are looking for help

Andrew's Focus Areas:

We are preparing for the 43.0.0 release and I am personally pretty excited about (and thus actively help / put to the top of my review list):

Recent and Upcoming Releases

Interesting discussions underway:

Community

Upcoming meetups:

Background:

Previous update: #12973

@Omega359
Copy link
Contributor

Congrats @goldmedal for becoming a committer! Thanks for all your hard work 🚀

@goldmedal
Copy link
Contributor

Many thanks!

@alamb alamb self-assigned this Oct 21, 2024
@alamb
Copy link
Contributor Author

alamb commented Oct 22, 2024

@XiangpengHao and @tustvold are also working on Adaptive Predicate Pushdown in the parquet reader of apache/arrow-rs#5523 -- this has the potential to allow DataFusion to always push predicates into the parquet decoder itself, further improving parquet read performance

As @Dandandan notes, this would let us finally do

See the great writeup from @tustvold here with analysis: apache/arrow-rs#6454 (comment)

cc @samuelcolvin

@alamb
Copy link
Contributor Author

alamb commented Oct 24, 2024

@SamSynnada started a great discussion about better spreading the word about DataFusion. Thank you 🙏 -- it is going to be a great year

@Xuanwo
Copy link
Member

Xuanwo commented Oct 24, 2024

Hi, I want to highlight datafusion-contrib/datafusion-orc#120 as a great example of community collaboration.

Given DataFusion's unique position, I believe there are many opportunities for such collaboration. It's also a great chance to thank everyone who made it possible, especially those who have maintained the project for a year.

@alamb
Copy link
Contributor Author

alamb commented Oct 26, 2024

@2010YOUY01 is rallying a team for improvements in externalized hash aggregation:

@alamb
Copy link
Contributor Author

alamb commented Oct 29, 2024

Next week: #13167

@alamb alamb closed this as completed Oct 29, 2024
@alamb alamb unpinned this issue Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants