use row encoding for SortExec #5292

jaylmiller · 2023-02-15T20:08:28Z

Which issue does this PR close?

Closes #5230.

Rationale for this change

Using arrow row format for SortExec should improve sorting speed. Additionally preserving that row encoding to be used
in SortPreservingMergeExec should improve perf by minimizing the amount of encoding needed to be done.

What changes are included in this PR?

Changes are all made within the sorts module (datafusion/core/src/physical_plan/sorts).
Had to change several files across the module to allow for streaming row encoding data between SortExec and SortPreservingMergeExec

Are these changes tested?

Existing unit tests for SortExec/SortPreservingMergeExec cover this for the most part.
For the new structs defined for handling selections of rows have new test.

Are there any user-facing changes?

Had to add a few new public exports since SortKeyCursor needs to be public (since it is used in an integration test).

… first

still need to update metrics tracking and spilling row format to disk

update memory sizes in spill related unit tests to account for rows limit sorts dont use row encoding.

datafusion/core/src/physical_plan/sorts/sort.rs

ozankabak · 2023-02-15T21:37:32Z

This looks good. Now that we have the benchmark too, can we have some numbers quantifying the performance from a before vs. after perspective?

One, we would make sure that we are really helping 🙂 Two, we can announce this to the community at large if the numbers look good!

jaylmiller · 2023-02-15T22:54:05Z

This looks good. Now that we have the benchmark too, can we have some numbers quantifying the performance from a before vs. after perspective?

One, we would make sure that we are really helping 🙂 Two, we can announce this to the community at large if the numbers look good!

I could post some preliminary bench results from my macbook once I finish up this last thing (row serialization).

Also there are some regressions on the merge bench that I want to try and cleanup as well before taking this out of draft status 💪

- use SortedStream instead of dynamic dispatch everywhere, gets rid of all the Box::pin being when passing streams between funcs

fix lil typo error in sort bench

alamb · 2023-03-14T14:39:42Z

What is the status of this PR? Does it have more work needed or is it ready for final review? Or does it need more benchmark results?

jaylmiller · 2023-03-14T15:33:05Z

Coding-wise everything is finished and code is ready to review. But in terms of bench results, I'm not 100% confident yet.

Sort micro-benchmarks are looking pretty good: significant improvements on cases where row encoding is actually used, minor regressions--mostly within error bars--on cases without row encoding but of course more experienced contributors would know better about how significant these regressions actually are:

group                                                     main-sort                                rows-sort
-----                                                     ---------                                ---------
sort f64                                                  1.00     10.8±0.23ms        ? ?/sec      1.04     11.2±0.93ms        ? ?/sec
sort f64 preserve partitioning                            1.00      4.0±0.27ms        ? ?/sec      1.04      4.1±0.28ms        ? ?/sec
sort i64                                                  1.00      9.5±0.55ms        ? ?/sec      1.09     10.3±0.74ms        ? ?/sec
sort i64 preserve partitioning                            1.00      3.3±0.10ms        ? ?/sec      1.06      3.5±0.13ms        ? ?/sec
sort mixed tuple                                          1.28     28.3±3.35ms        ? ?/sec      1.00     22.2±1.60ms        ? ?/sec
sort mixed tuple preserve partitioning                    1.00      3.6±0.17ms        ? ?/sec      1.15      4.1±1.09ms        ? ?/sec
sort mixed utf8 dictionary tuple                          2.84     52.7±8.27ms        ? ?/sec      1.00     18.6±1.29ms        ? ?/sec
sort mixed utf8 dictionary tuple preserve partitioning    1.02      4.2±0.92ms        ? ?/sec      1.00      4.1±0.55ms        ? ?/sec
sort utf8 dictionary                                      1.00      3.7±0.21ms        ? ?/sec      1.04      3.9±0.33ms        ? ?/sec
sort utf8 dictionary preserve partitioning                1.00  1487.2±1444.67µs        ? ?/sec    1.01  1502.8±315.79µs        ? ?/sec
sort utf8 dictionary tuple                                3.26    57.0±11.35ms        ? ?/sec      1.00     17.5±2.08ms        ? ?/sec
sort utf8 dictionary tuple preserve partitioning          1.13      4.1±1.08ms        ? ?/sec      1.00      3.6±0.52ms        ? ?/sec
sort utf8 high cardinality                                1.01     28.0±3.70ms        ? ?/sec      1.00     27.6±3.81ms        ? ?/sec
sort utf8 high cardinality preserve partitioning          1.00     11.1±1.48ms        ? ?/sec      1.21     13.5±3.38ms        ? ?/sec
sort utf8 low cardinality                                 1.00     15.3±5.08ms        ? ?/sec      1.10     16.9±6.20ms        ? ?/sec
sort utf8 low cardinality preserve partitioning           1.03      8.1±2.21ms        ? ?/sec      1.00      7.8±1.75ms        ? ?/sec
sort utf8 tuple                                           1.96     56.8±8.36ms        ? ?/sec      1.00     29.0±4.82ms        ? ?/sec
sort utf8 tuple preserve partitioning                     1.02      6.7±0.95ms        ? ?/sec      1.00      6.5±0.46ms        ? ?/sec

In summary, I'd like to get an opinion on these micro bench results. And then also ideally, we can run the e2e bench comparisons (#5561) on tpch and parquet and get a bit more data on whether this change is worth merging.

alamb · 2023-03-15T15:35:16Z

In summary, I'd like to get an opinion on these micro bench results. And then also ideally, we can run the e2e bench comparisons (#5561) on tpch and parquet and get a bit more data on whether this change is worth merging.

Ok, cool -- thank you -- I will put this on my worklist for tomorrow morning. Thank you

alamb · 2023-03-16T13:44:14Z

Starting to run some tests locally / review this PR more carefully

alamb · 2023-03-16T18:37:56Z

I ran this command against both main and this branch:

cargo run --release --bin parquet -- sort  --path ~/access-log-gen/ --scale-factor 1.0

The results seem to be consistent with the microbenchmarks reported above that this branch is currently slower in several cases

main_sort.txt
new_sort.txt

jaylmiller · 2023-03-16T19:12:55Z

Thanks @alamb . Any ideas on what might be going on? I'm a bit stumped 😅

alamb · 2023-03-17T10:40:12Z

Thanks @alamb . Any ideas on what might be going on? I'm a bit stumped 😅

I am also a bit stumped. I think I need to sit down and get some profiling data. I'll try and do it later today but realistically it probably won't be until this weekend.

jaylmiller · 2023-03-17T14:06:24Z

No rush! Does anything stick out to you in this small block of code (this is where the root of the perf issues might be coming from...)

https://github.com/jaylmiller/arrow-datafusion/blob/d20263866a77720da33b69a6416cd086574f6968/datafusion/core/src/physical_plan/sorts/sort.rs#L1087-L1112

alamb · 2023-03-19T10:28:02Z

No rush! Does anything stick out to you in this small block of code (this is where the root of the perf issues might be coming from...)

https://github.com/jaylmiller/arrow-datafusion/blob/d20263866a77720da33b69a6416cd086574f6968/datafusion/core/src/physical_plan/sorts/sort.rs#L1103-L1105

            let mut to_sort: Vec<(usize, Row)> = rows.into_iter().enumerate().collect();
            to_sort.sort_unstable_by(|(_, row_a), (_, row_b)| row_a.cmp(row_b));
            let sorted_indices = to_sort.iter().map(|(idx, _)| *idx).collect::<Vec<_>>();

~~I think the sort_unstable_by will effectively copy the rows around (to sort them in the vec) and then calling the take kernel will effectively copy them again to form the output.~~

Update -- not sure about this anymore

I am going to run some profiling experiments and see if I can confirm that this copy takes a significant amount of the execution time.

alamb · 2023-03-19T11:36:09Z

I ran some profiling of using datafusion-cli to run this query

create table  test as select * from  '/tmp/logs.parquet' ORDER BY request_method;

like

./target/release/datafusion-cli  -f /tmp/script.sql

Here is the flamegraph that came from hotspot. I tried to annotate it the best I can but I am not quite sure how to explain it

I haven't had a chance to study this PR in detail -- but I think we are at the point where we should profile the benches / queries that got slower and figure out what is going on.

Given we don't have much (any?) measurements showing a performance improvement I feel like something else must be going on

tustvold · 2023-03-19T12:08:47Z

MutableArrayData would suggest the majority of time is spent reconstructing the sorted data, it is not impossible that decoding from the sorted rows may be faster, or using the interleave kernel. At the very least this should reduce the amount of realloc.

jaylmiller · 2023-03-19T15:37:53Z

I think we are at the point where we should profile the benches / queries that got slower and figure out what is going on.

Thanks for the ideas and suggestions. I will start looking into this more.

Given we don't have much (any?) measurements showing a performance improvement I feel like something else must be going on

There are several micro bench cases with significant improvements (mostly the tuple sort cases) which is why i'm a bit confused why the e2e parquet bench is showing worse perf on the tuple sorts

MutableArrayData would suggest the majority of time is spent reconstructing the sorted data, it is not impossible that decoding from the sorted rows may be faster, or using the interleave kernel. At the very least this should reduce the amount of realloc.

Will look into the interleave kernel that is a good idea thanks

alamb · 2023-03-28T20:27:07Z

Marking as draft to signify this PR has feedback and is not waiting for another review at the moment.

jaylmiller · 2023-04-05T21:21:08Z

Marking as draft to signify this PR has feedback and is not waiting for another review at the moment.

Sorry about the inactivity @alamb . Update on my end is that I kind of hit a wall with this. I see that @tustvold has some PRs open that seem like they are working towards tackling using row encoding for SortExec (probably will end up being better than my attempt here anyways). So perhaps we should just close this one?

alamb · 2023-04-06T17:41:13Z

Update on my end is that I kind of hit a wall with this. I see that @tustvold has some PRs open that seem like they are working towards tackling using row encoding for SortExec (probably will end up being better than my attempt here anyways). So perhaps we should just close this one?

I think that would be best. Thank you for your efforts @jaylmiller -- I did not realize how tricky this would turn out to be. I think your work and initial findings have significantly influenced what @tustvold is up to so while maybe this code won't get merged as is it will live on in spirit.

Thank you very much, again

jaylmiller added 14 commits February 10, 2023 11:56

modify sort_batch to use arrow row format for multi-column sorts

ec44910

use row encoding for in memory partial sorting within SortExec

c7c43e4

Revert preserving row encoding changes

2e1c143

add bench for SortExec

2ab03ee

fix preserve partitioning case to run every partition instead of just…

11be061

… first

rough draft: sorting works.

730a89c

still need to update metrics tracking and spilling row format to disk

Merge branch 'apache:master' into master

4b1ea08

Merge branch 'sort-preserve-row-encoding'

a64705f

remove some todos

7aaddb5

fix clippy warnings

96a2e15

checkpointing

a3c632c

SortPreservingMergeStream emits row encodings when used from SortExec

c51b23c

spill logic working (w/ a temporary serialization format solution)

0f7bfc3

Merge branch 'sort-exec'

cdf72d8

github-actions bot added the core Core DataFusion crate label Feb 15, 2023

jaylmiller added 2 commits February 15, 2023 16:00

add row encoding sizes to spill calculations.

e313134

update memory sizes in spill related unit tests to account for rows limit sorts dont use row encoding.

clean comments and small todos

8b2450b

jaylmiller commented Feb 15, 2023

View reviewed changes

datafusion/core/src/physical_plan/sorts/sort.rs Outdated Show resolved Hide resolved

jaylmiller added 3 commits February 16, 2023 15:18

cleanup SortedStream types

085a871

- use SortedStream instead of dynamic dispatch everywhere, gets rid of all the Box::pin being when passing streams between funcs

add SortExec input case to each merge bench case

4196a25

fix lil typo error in sort bench

row serialization format

d4f5c10

jaylmiller marked this pull request as ready for review February 16, 2023 23:02

jaylmiller marked this pull request as draft February 17, 2023 00:46

RowBatch construction re-use row ref offsets instead of just appending

33c611c

jaylmiller marked this pull request as ready for review February 17, 2023 01:25

jaylmiller added 3 commits February 17, 2023 13:46

Merge branch 'apache:main' into master

331d205

Merge branch 'apache:main' into master

cba6d30

dont need to keep array refs if we use rows

1513f9a

jaylmiller added 7 commits March 4, 2023 15:34

Merge branch 'apache:main' into master

e80578b

use mergesort in the merge step of sort exec

b82545e

remove experimental bench cases

c685a0d

move gating logic outside of sort_batch

08b3fe5

dont use row encoding on single batch code path

790546f

batch insertion order fix

d13912b

Merge branch 'apache:main' into master

fe79275

alamb mentioned this pull request Mar 12, 2023

Report and compare benchmark runs against two branches #5561

Closed

Merge branch 'apache:main' into master

0a37892

Merge branch 'apache:main' into master

d202638

jaylmiller mentioned this pull request Mar 20, 2023

Improve performance of COUNT (distinct x) for dictionary columns #258 #5554

Closed

alamb mentioned this pull request Mar 23, 2023

Memory Limited Joins (Externalized / Spill) #1599

Open

5 tasks

alamb marked this pull request as draft March 28, 2023 20:27

tustvold mentioned this pull request Apr 3, 2023

Use SortPreservingMerge for in memory sort #5851

Closed

alamb closed this Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use row encoding for SortExec #5292

use row encoding for SortExec #5292

jaylmiller commented Feb 15, 2023

ozankabak commented Feb 15, 2023

jaylmiller commented Feb 15, 2023 •

edited

Loading

alamb commented Mar 14, 2023

jaylmiller commented Mar 14, 2023 •

edited

Loading

alamb commented Mar 15, 2023

alamb commented Mar 16, 2023

alamb commented Mar 16, 2023

jaylmiller commented Mar 16, 2023

alamb commented Mar 17, 2023

jaylmiller commented Mar 17, 2023

alamb commented Mar 19, 2023 •

edited

Loading

alamb commented Mar 19, 2023 •

edited

Loading

tustvold commented Mar 19, 2023 •

edited

Loading

jaylmiller commented Mar 19, 2023 •

edited

Loading

alamb commented Mar 28, 2023

jaylmiller commented Apr 5, 2023 •

edited

Loading

alamb commented Apr 6, 2023

use row encoding for SortExec #5292

use row encoding for SortExec #5292

Conversation

jaylmiller commented Feb 15, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

ozankabak commented Feb 15, 2023

jaylmiller commented Feb 15, 2023 • edited Loading

alamb commented Mar 14, 2023

jaylmiller commented Mar 14, 2023 • edited Loading

alamb commented Mar 15, 2023

alamb commented Mar 16, 2023

alamb commented Mar 16, 2023

jaylmiller commented Mar 16, 2023

alamb commented Mar 17, 2023

jaylmiller commented Mar 17, 2023

alamb commented Mar 19, 2023 • edited Loading

alamb commented Mar 19, 2023 • edited Loading

tustvold commented Mar 19, 2023 • edited Loading

jaylmiller commented Mar 19, 2023 • edited Loading

alamb commented Mar 28, 2023

jaylmiller commented Apr 5, 2023 • edited Loading

alamb commented Apr 6, 2023

jaylmiller commented Feb 15, 2023 •

edited

Loading

jaylmiller commented Mar 14, 2023 •

edited

Loading

alamb commented Mar 19, 2023 •

edited

Loading

alamb commented Mar 19, 2023 •

edited

Loading

tustvold commented Mar 19, 2023 •

edited

Loading

jaylmiller commented Mar 19, 2023 •

edited

Loading

jaylmiller commented Apr 5, 2023 •

edited

Loading