Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: iterative scalar procesing #2314

Merged
merged 8 commits into from
Dec 28, 2023

Conversation

tychoish
Copy link
Contributor

This should address some of the concerns #2304

Pro: we iterate through arrays (when we're already going to deal with
scalar values) exactly once, and there's no intermediate data
structure. It's pretty simple from the perspective of the call sites.

Con: I had to copy paste a bunch of datafusion casting code in (though
I'm not worried about it) but I haven't made the tests compil yet
either.

@tychoish tychoish changed the base branch from main to tycho/scalar-value-handling December 27, 2023 22:20
@tychoish tychoish requested a review from scsmithr December 28, 2023 05:55
}
ColumnarValue::Array(arr) => Ok(ColumnarValue::Array(scalar_iter_to_array(
(0..arr.len()).map(|idx| -> Result<ScalarValue, ExtensionError> {
Ok(op(ScalarValue::try_from_array(arr, idx)?)?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to avoid using try_from_array. There's quite a bit of work being done in there that'll slow things down in a hot loop like this.

I'd suggest that we pass in the return type to the get_nth function. Then we can try to extract the value directly, skipping all of the checks that happen in try_from_array. Similarly, scalar_iter_to_array could skip peeking at the elements if it knew the return type.

@tychoish
Copy link
Contributor Author

I found this gem in the try_unary docs:

Note: LLVM is currently unable to effectively vectorize fallible operations

Which means, that if we want to have ScalarUDFs that are fallible, we're not going to get too much benefit out of too much extra optimization because as long as we want to accept the possibility of errors (which I think we should for some of these options), we're really just optimizing things for health.

@tychoish tychoish merged commit 7435a23 into tycho/scalar-value-handling Dec 28, 2023
13 checks passed
@tychoish tychoish deleted the tycho/lazy-scalar-handling branch December 28, 2023 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants