Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ClickBench benchmarks with DataFusion 34 #8789

Closed
alamb opened this issue Jan 8, 2024 · 7 comments
Closed

Update ClickBench benchmarks with DataFusion 34 #8789

alamb opened this issue Jan 8, 2024 · 7 comments
Labels
enhancement New feature or request performance Make DataFusion faster

Comments

@alamb
Copy link
Contributor

alamb commented Jan 8, 2024

Is your feature request related to a problem or challenge?

DataFusion 34 has been released. It woudl be great to update ClickBench https://benchmark.clickhouse.com/ with runs from the latest version.

It appears we still are at 33: https://github.com/ClickHouse/ClickBench/blob/main/datafusion/results/partitioned.json

Describe the solution you'd like

Perhaps we can follow the model of ClickHouse/ClickBench#145 (thanks @kmitchener )

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added enhancement New feature or request performance Make DataFusion faster labels Jan 8, 2024
@kmitchener
Copy link
Contributor

kmitchener commented Jan 9, 2024

I re-ran the benchmark and v34 is actually a little bit slower than v33. I filter the benchmark for just type: "stateless" on machine: "c6a.4xlarge, 500gb gp2" to compare against similar systems. You can compare the index.html from the PR to the one on the site: filtered benchmark.

@alamb
Copy link
Contributor Author

alamb commented Jan 9, 2024

I re-ran the benchmark and v34 is actually a little bit slower than v33. I filter the benchmark for just type: "stateless" on machine: "c6a.4xlarge, 500gb gp2" to compare against similar systems. You can compare the index.html from the PR to the one on the site: filtered benchmark.

Thanks @kmitchener !

Do you have a sense for how reproduceable the numbers are? Many of the reported performance differences are less than 100ms, and at that range I have found significant variation.

I wonder if you ran it more than once and are they reproducable?

@kmitchener
Copy link
Contributor

Do you have a sense for how reproduceable the numbers are? Many of the reported performance differences are less than 100ms, and at that range I have found significant variation.

I wonder if you ran it more than once and are they reproducible?

I ran this on the same instance that I ran the v33 benchmarks on (it's been powered down since the v33 run). The numbers are consistently slightly slower compared to v33. I ran it twice, since it was unexpected. I also updated to latest Rust and ran again -- that's what I published.

I'll run the v33 benchmark again on the same instance and see what it produces.

@kmitchener
Copy link
Contributor

I've re-run the v33 benchmarks on the same instance and modified the benchmark so it will display both 33 and 34 at the same time so you can compare the runs:
image

You can grab that from -> https://github.com/kmitchener/ClickBench/blob/new-run-of-datafusion-33/index.html

@alamb
Copy link
Contributor Author

alamb commented Jan 10, 2024

🤔 I don't really have any good reason / understanding of what is different

I looked at q13 briefly which has the largest reported change and I don't really have any good explanation about why it is reported 8% slower.

Screenshot 2024-01-10 at 3 46 22 PM

And I profiled running it locally,
Screenshot 2024-01-10 at 4 25 27 PM

It doesn't seem to show anything obvious -- the biggest offenders are managing the group keys which I don't think changed much

@alamb
Copy link
Contributor Author

alamb commented Jan 11, 2024

Thanks @kmitchener -- I filed #8836 to look into the results

@alamb
Copy link
Contributor Author

alamb commented Feb 29, 2024

The results for 34 were merged, and we just release 36 (including #8827) . Let's see track the work for Updating ClickBench for DataFusion 36 in #9404

Thanks again @kmitchener

@alamb alamb closed this as completed Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Make DataFusion faster
Projects
None yet
Development

No branches or pull requests

2 participants