Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Specialization Instead of ScalarValue Binary Operations #6842

Open
tustvold opened this issue Jul 4, 2023 · 2 comments
Open

Use Specialization Instead of ScalarValue Binary Operations #6842

tustvold opened this issue Jul 4, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@tustvold
Copy link
Contributor

tustvold commented Jul 4, 2023

Is your feature request related to a problem or challenge?

Currently a number of operations are implemented directly on ScalarValue, including:

  • Arithmetic
  • Logical And
  • Logical Or
  • BitAnd, BitXor, etc...
  • Comparison

Not only does this result in a huge amount of code, but also these operations don't behave the same way as their array counterparts.

For example:

  • An operation on a null doesn't yield a null
  • Floating point NaNs are not total ordered
  • No support for decimals
  • Differing support for interval arithmetic compared to the array kernels

Describe the solution you'd like

These kernels largely appear to exist for the purposes of aggregation, where the aggregated types are known statically. We should replace these uses with specialization, as done in #6800 (comment). The remaining uses should make use of the new Datum abstraction apache/arrow-rs#4393 to use the same arrow-rs kernels apache/arrow-rs#4465

Describe alternatives you've considered

No response

Additional context

#4973 tracks improving the aggregator performance
#6832 updates DF to use the Datum kernels

@tustvold tustvold added the enhancement New feature or request label Jul 4, 2023
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Jul 4, 2023
tustvold added a commit that referenced this issue Jul 6, 2023
* Deprecate ScalarValue::and, ScalarValue::or (#6842)

* Review feedback
alamb pushed a commit to alamb/datafusion that referenced this issue Jul 6, 2023
* Deprecate ScalarValue::and, ScalarValue::or (apache#6842)

* Review feedback
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Aug 21, 2023
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Aug 21, 2023
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Aug 21, 2023
tustvold added a commit that referenced this issue Aug 21, 2023
* Deprecate ScalarValue bitor, bitand, and bitxor (#6842)

* Fixes

* Format

* Fix BitAndAccumulator
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Aug 21, 2023
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Aug 22, 2023
tustvold added a commit that referenced this issue Aug 23, 2023
* Specialize SUM and AVG (#6842)

* Specialize Distinct Sum

* Review feedback

* Update sqllogictest
@alamb
Copy link
Contributor

alamb commented Sep 11, 2023

I think we have made substantial progress on this issue -- what is left to do?

@tustvold
Copy link
Contributor Author

IIRC there are some aggregates, like first and last that are not yet specialized

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants