-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize Avg and Sum accumulators (#6842) #7358
Conversation
let delta = sum_batch(values, &self.sum.get_datatype())?; | ||
self.sum = self.sum.sub(&delta)?; | ||
if let Some(x) = sum(values) { | ||
self.sum = Some(self.sum.unwrap() - x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels instinctively wrong, as it will accumulate errors over time... I'm not really sure what to do about this though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is expected for floats (otherwise we would need to keep intermediate values). Switching to decimal should allow for precise values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will be looking into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation looks correct. There is indeed some error accumulation when doing incremental calculations, but it is unavoidable (and very very rarely causes issues in practice)
use datafusion_expr::aggregate_function::sum_type_of_avg; | ||
use datafusion_expr::type_coercion::aggregates::avg_return_type; | ||
|
||
fn test_with_pre_cast(array: ArrayRef, expected: ScalarValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is necessary because generic_test_op would call Avg::new which would then not have the correct arguments. This replicates the logic in build_in
// instantiate specialized accumulator based for the type | ||
match (&self.sum_data_type, &self.rt_data_type) { | ||
(Float64, Float64) => { | ||
Ok(Box::new(AvgAccumulator::new(self.pre_cast_to_sum_type))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole precast thing seems like a hack, imo this should be being handled by the type coercion machinery, not internal to the aggregator...
Marking as draft as I'd like to get #7369 in first |
63d9971
to
1fd74c8
Compare
|
||
impl<T: ToByteSlice> std::hash::Hash for Hashable<T> { | ||
fn hash<H: std::hash::Hasher>(&self, state: &mut H) { | ||
self.0.to_byte_slice().hash(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other cases I think we only do this for floats and use the state.hash_one(self.0)
in other cases (for primitives it's faster)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does use ahash, it overrides the BuildHasher used by the HashSet. I think we could definitely do something better here, but in the absence of benchmarks I'm keen to go with simple and we can always optimise it later down the line. Regardless this should be significantly faster than the prior approach
Edit: If someone really care about the performance of DistinctSum, implementing a GroupsAccumulator will likely yield far greater performance than any incremental tweaking of this Accumulator-based version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: If someone really care about the performance of DistinctSum, implementing a GroupsAccumulator will likely yield far greater performance than any incremental tweaking of this Accumulator-based version
yeah true :)
Ok(Box::new(DistinctSumAccumulator::try_new(&self.data_type)?)) | ||
macro_rules! helper { | ||
($t:ty, $dt:expr) => { | ||
Ok(Box::new(DistinctSumAccumulator::<$t>::try_new(&$dt)?)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do the same for DistinctCountAccumulator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite possibly, right now I'm just doing the minimum to be able to remove the ScalarValue arithmetic kernels 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DistinctCountAccumulator
also uses ScalarValue
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But not arithmetic 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha 😁 let me follow up this PR then ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tustvold -- these changes make sense to me.
I think we should run some basic performance tests too if we have any that cover queries that use these functions (like SUM DISTINCT and using sliding sum in window functions). Maybe @metesynnada or @ozankabak know of some benchmarks we can run that cover it
target_scale: *target_scale, | ||
})), | ||
_ => not_impl_err!( | ||
"AvgGroupsAccumulator for ({} --> {})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"AvgGroupsAccumulator for ({} --> {})", | |
"AvgAccumulator for ({} --> {})", |
} | ||
downcast_sum!(self, helper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the multiple levels of macros is concise I'll give you that but I do find it hard to follow. Maybe that is ok as we don't expect this to be changing
$FN, | ||
))) | ||
}}; | ||
/// Sum only supports a subset of numeric types, instead relying on type coercion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might help to document what this macro does (aka calls helper macro given a s (what is s? The aggregate?))
We will discuss this tomorrow and circle back to you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 👍
Running the TPCH benchmarks
|
Which issue does this PR close?
Part of #6842
Rationale for this change
This makes it easier to see what is going on, and avoids using ScalarValue arithmetic
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?