Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Improved DataFusion Benchmarking #5505

Open
8 of 12 tasks
alamb opened this issue Mar 7, 2023 · 3 comments
Open
8 of 12 tasks

[Epic] Improved DataFusion Benchmarking #5505

alamb opened this issue Mar 7, 2023 · 3 comments
Labels
enhancement New feature or request performance Make DataFusion faster

Comments

@alamb
Copy link
Contributor

alamb commented Mar 7, 2023

Call to action:

Let's invest more effort in DataFusion benchmarking, both as a mechanism for technical evangelism as well as a guide for actual performance improvements.

Background

We have several examples of performance “comparisons” showing DataFusion not doing well against DuckDB or pola.rs that really was a test of how fast CSV or JSON parsing can go (this blog is one such example) – recent work should make these comparisons much more favorable in the future

It is in the interest of all projects based on DataFusion to focus on their own users and use cases rather than having to explain why they are using supposedly "inferior" technology due to misleading benchmark results (for example recently on ClickBench – see #5276).

Of course not only will improved benchmarking help evangelize DataFusion more, it will also directly help guide the community’s optimization efforts.

Related Tickets

@alamb alamb added enhancement New feature or request performance Make DataFusion faster labels Mar 7, 2023
@comphead
Copy link
Contributor

comphead commented Mar 7, 2023

@alamb 5276 included twice

@jackwener
Copy link
Member

jackwener commented Mar 8, 2023

@alamb 5276 included twice

Thanks for reminder, I have removed it.

@alamb alamb changed the title [Epic] DataFusion Benchmarking [Epic] Improved DataFusion Benchmarking Apr 26, 2023
@alamb
Copy link
Contributor Author

alamb commented Apr 28, 2023

Here is a proposed PR to orchestrate running the benchmarks: #6131

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Make DataFusion faster
Projects
None yet
Development

No branches or pull requests

3 participants