Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize Avg and Sum accumulators (#6842) #7358

Merged
merged 4 commits into from
Aug 23, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Aug 21, 2023

Which issue does this PR close?

Part of #6842

Rationale for this change

This makes it easier to see what is going on, and avoids using ScalarValue arithmetic

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@tustvold tustvold added the api change Changes the API exposed to users of the crate label Aug 21, 2023
@github-actions github-actions bot added physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate labels Aug 21, 2023
let delta = sum_batch(values, &self.sum.get_datatype())?;
self.sum = self.sum.sub(&delta)?;
if let Some(x) = sum(values) {
self.sum = Some(self.sum.unwrap() - x);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels instinctively wrong, as it will accumulate errors over time... I'm not really sure what to do about this though...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is expected for floats (otherwise we would need to keep intermediate values). Switching to decimal should allow for precise values.

FYI @ozankabak @metesynnada

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will be looking into this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation looks correct. There is indeed some error accumulation when doing incremental calculations, but it is unavoidable (and very very rarely causes issues in practice)

use datafusion_expr::aggregate_function::sum_type_of_avg;
use datafusion_expr::type_coercion::aggregates::avg_return_type;

fn test_with_pre_cast(array: ArrayRef, expected: ScalarValue) {
Copy link
Contributor Author

@tustvold tustvold Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is necessary because generic_test_op would call Avg::new which would then not have the correct arguments. This replicates the logic in build_in

// instantiate specialized accumulator based for the type
match (&self.sum_data_type, &self.rt_data_type) {
(Float64, Float64) => {
Ok(Box::new(AvgAccumulator::new(self.pre_cast_to_sum_type)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole precast thing seems like a hack, imo this should be being handled by the type coercion machinery, not internal to the aggregator...

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 22, 2023
@tustvold tustvold marked this pull request as draft August 22, 2023 13:05
@tustvold
Copy link
Contributor Author

Marking as draft as I'd like to get #7369 in first

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Aug 22, 2023
@tustvold tustvold changed the title Specialize AvgAccumulator (#6842) Specialize Avg and Sum (#6842) Aug 22, 2023
@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Aug 22, 2023
@tustvold tustvold marked this pull request as ready for review August 22, 2023 15:36

impl<T: ToByteSlice> std::hash::Hash for Hashable<T> {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.0.to_byte_slice().hash(state)
Copy link
Contributor

@Dandandan Dandandan Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other cases I think we only do this for floats and use the state.hash_one(self.0) in other cases (for primitives it's faster)

Copy link
Contributor Author

@tustvold tustvold Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does use ahash, it overrides the BuildHasher used by the HashSet. I think we could definitely do something better here, but in the absence of benchmarks I'm keen to go with simple and we can always optimise it later down the line. Regardless this should be significantly faster than the prior approach

Edit: If someone really care about the performance of DistinctSum, implementing a GroupsAccumulator will likely yield far greater performance than any incremental tweaking of this Accumulator-based version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: If someone really care about the performance of DistinctSum, implementing a GroupsAccumulator will likely yield far greater performance than any incremental tweaking of this Accumulator-based version

yeah true :)

Ok(Box::new(DistinctSumAccumulator::try_new(&self.data_type)?))
macro_rules! helper {
($t:ty, $dt:expr) => {
Ok(Box::new(DistinctSumAccumulator::<$t>::try_new(&$dt)?))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do the same for DistinctCountAccumulator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possibly, right now I'm just doing the minimum to be able to remove the ScalarValue arithmetic kernels 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DistinctCountAccumulator also uses ScalarValue ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not arithmetic 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha 😁 let me follow up this PR then ;)

@alamb alamb changed the title Specialize Avg and Sum (#6842) Specialize Avg and Sum accumulators (#6842) Aug 22, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold -- these changes make sense to me.

I think we should run some basic performance tests too if we have any that cover queries that use these functions (like SUM DISTINCT and using sliding sum in window functions). Maybe @metesynnada or @ozankabak know of some benchmarks we can run that cover it

target_scale: *target_scale,
})),
_ => not_impl_err!(
"AvgGroupsAccumulator for ({} --> {})",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"AvgGroupsAccumulator for ({} --> {})",
"AvgAccumulator for ({} --> {})",

}
downcast_sum!(self, helper)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the multiple levels of macros is concise I'll give you that but I do find it hard to follow. Maybe that is ok as we don't expect this to be changing

$FN,
)))
}};
/// Sum only supports a subset of numeric types, instead relying on type coercion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to document what this macro does (aka calls helper macro given a s (what is s? The aggregate?))

@ozankabak
Copy link
Contributor

I think we should run some basic performance tests too if we have any that cover queries that use these functions (like SUM DISTINCT and using sliding sum in window functions). Maybe @metesynnada or @ozankabak know of some benchmarks we can run that cover it

We will discuss this tomorrow and circle back to you

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👍

@tustvold
Copy link
Contributor Author

Running the TPCH benchmarks

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ specialize-avg-accumulator ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  671.28ms │                   653.98ms │     no change │
│ QQuery 2     │  147.40ms │                   136.89ms │ +1.08x faster │
│ QQuery 3     │  259.99ms │                   252.70ms │     no change │
│ QQuery 4     │  150.08ms │                   145.15ms │     no change │
│ QQuery 5     │  347.39ms │                   348.76ms │     no change │
│ QQuery 6     │  152.53ms │                   147.19ms │     no change │
│ QQuery 7     │  595.27ms │                   560.30ms │ +1.06x faster │
│ QQuery 8     │  382.90ms │                   368.43ms │     no change │
│ QQuery 9     │  624.58ms │                   593.67ms │     no change │
│ QQuery 10    │  462.84ms │                   452.43ms │     no change │
│ QQuery 11    │  138.77ms │                   127.82ms │ +1.09x faster │
│ QQuery 12    │  221.17ms │                   213.70ms │     no change │
│ QQuery 13    │  389.77ms │                   375.43ms │     no change │
│ QQuery 14    │  209.97ms │                   209.11ms │     no change │
│ QQuery 15    │  149.18ms │                   144.03ms │     no change │
│ QQuery 16    │  134.44ms │                   114.72ms │ +1.17x faster │
│ QQuery 17    │  707.01ms │                   694.61ms │     no change │
│ QQuery 18    │ 1066.54ms │                  1086.68ms │     no change │
│ QQuery 19    │  384.12ms │                   367.69ms │     no change │
│ QQuery 20    │  343.48ms │                   317.19ms │ +1.08x faster │
│ QQuery 21    │  850.10ms │                   833.78ms │     no change │
│ QQuery 22    │  107.96ms │                   100.84ms │ +1.07x faster │
└──────────────┴───────────┴────────────────────────────┴───────────────┘

@tustvold tustvold merged commit 6c785d1 into apache:main Aug 23, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate optimizer Optimizer rules physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants