-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use row encoding for SortExec #5292
Conversation
still need to update metrics tracking and spilling row format to disk
update memory sizes in spill related unit tests to account for rows limit sorts dont use row encoding.
This looks good. Now that we have the benchmark too, can we have some numbers quantifying the performance from a before vs. after perspective? One, we would make sure that we are really helping 🙂 Two, we can announce this to the community at large if the numbers look good! |
I could post some preliminary bench results from my macbook once I finish up this last thing (row serialization). Also there are some regressions on the merge bench that I want to try and cleanup as well before taking this out of draft status 💪 |
- use SortedStream instead of dynamic dispatch everywhere, gets rid of all the Box::pin being when passing streams between funcs
fix lil typo error in sort bench
What is the status of this PR? Does it have more work needed or is it ready for final review? Or does it need more benchmark results? |
Coding-wise everything is finished and code is ready to review. But in terms of bench results, I'm not 100% confident yet. Sort micro-benchmarks are looking pretty good: significant improvements on cases where row encoding is actually used, minor regressions--mostly within error bars--on cases without row encoding but of course more experienced contributors would know better about how significant these regressions actually are:
In summary, I'd like to get an opinion on these micro bench results. And then also ideally, we can run the e2e bench comparisons (#5561) on |
Ok, cool -- thank you -- I will put this on my worklist for tomorrow morning. Thank you |
Starting to run some tests locally / review this PR more carefully |
I ran this command against both main and this branch:
The results seem to be consistent with the microbenchmarks reported above that this branch is currently slower in several cases |
Thanks @alamb . Any ideas on what might be going on? I'm a bit stumped 😅 |
I am also a bit stumped. I think I need to sit down and get some profiling data. I'll try and do it later today but realistically it probably won't be until this weekend. |
No rush! Does anything stick out to you in this small block of code (this is where the root of the perf issues might be coming from...) |
let mut to_sort: Vec<(usize, Row)> = rows.into_iter().enumerate().collect();
to_sort.sort_unstable_by(|(_, row_a), (_, row_b)| row_a.cmp(row_b));
let sorted_indices = to_sort.iter().map(|(idx, _)| *idx).collect::<Vec<_>>();
Update -- not sure about this anymore I am going to run some profiling experiments and see if I can confirm that this copy takes a significant amount of the execution time. |
I ran some profiling of using create table test as select * from '/tmp/logs.parquet' ORDER BY request_method; like ./target/release/datafusion-cli -f /tmp/script.sql Here is the flamegraph that came from I haven't had a chance to study this PR in detail -- but I think we are at the point where we should profile the benches / queries that got slower and figure out what is going on. Given we don't have much (any?) measurements showing a performance improvement I feel like something else must be going on |
MutableArrayData would suggest the majority of time is spent reconstructing the sorted data, it is not impossible that decoding from the sorted rows may be faster, or using the interleave kernel. At the very least this should reduce the amount of realloc. |
Thanks for the ideas and suggestions. I will start looking into this more.
There are several micro bench cases with significant improvements (mostly the tuple sort cases) which is why i'm a bit confused why the e2e parquet bench is showing worse perf on the tuple sorts
Will look into the interleave kernel that is a good idea thanks |
Marking as draft to signify this PR has feedback and is not waiting for another review at the moment. |
Sorry about the inactivity @alamb . Update on my end is that I kind of hit a wall with this. I see that @tustvold has some PRs open that seem like they are working towards tackling using row encoding for SortExec (probably will end up being better than my attempt here anyways). So perhaps we should just close this one? |
I think that would be best. Thank you for your efforts @jaylmiller -- I did not realize how tricky this would turn out to be. I think your work and initial findings have significantly influenced what @tustvold is up to so while maybe this code won't get merged as is it will live on in spirit. Thank you very much, again |
Which issue does this PR close?
Closes #5230.
Rationale for this change
Using arrow row format for SortExec should improve sorting speed. Additionally preserving that row encoding to be used
in SortPreservingMergeExec should improve perf by minimizing the amount of encoding needed to be done.
What changes are included in this PR?
datafusion/core/src/physical_plan/sorts
).Are these changes tested?
Are there any user-facing changes?
Had to add a few new public exports since SortKeyCursor needs to be public (since it is used in an integration test).