-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: iterative scalar procesing #2314
chore: iterative scalar procesing #2314
Conversation
} | ||
ColumnarValue::Array(arr) => Ok(ColumnarValue::Array(scalar_iter_to_array( | ||
(0..arr.len()).map(|idx| -> Result<ScalarValue, ExtensionError> { | ||
Ok(op(ScalarValue::try_from_array(arr, idx)?)?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to avoid using try_from_array
. There's quite a bit of work being done in there that'll slow things down in a hot loop like this.
I'd suggest that we pass in the return type to the get_nth
function. Then we can try to extract the value directly, skipping all of the checks that happen in try_from_array
. Similarly, scalar_iter_to_array
could skip peeking at the elements if it knew the return type.
I found this gem in the
Which means, that if we want to have ScalarUDFs that are fallible, we're not going to get too much benefit out of too much extra optimization because as long as we want to accept the possibility of errors (which I think we should for some of these options), we're really just optimizing things for health. |
This should address some of the concerns #2304
Pro: we iterate through arrays (when we're already going to deal with
scalar values) exactly once, and there's no intermediate data
structure. It's pretty simple from the perspective of the call sites.
Con: I had to copy paste a bunch of datafusion casting code in (though
I'm not worried about it) but I haven't made the tests compil yet
either.