-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support org.apache.spark.sql.catalyst.expressions.ArrayExists #4815
Comments
This should end up being an ArrayTransform on the lambda function followed by an array reduction using any. We could hack it just like we do for array_max and array_min, but CUDF is putting in rapidsai/cudf#9621 and we should be able to switch over to that instead. They also support the any aggregation in that new API. Also the API is not specific to list/arrays so we could avoid copying the result of the higher order function into an array. We probably could just do the reduction directly on the result and the offsets in the input. |
IIUC we could implement ArrayAggregate on top of rapidsai/cudf#9621.
For example:
|
We do not support the generic For this specific case it is not too bad. We would have to be sure that And this is just to try and match the code for If we want to support |
This PR implements ArrayExists, it has two major phases 1. first apply function to produce array of Booleans 2. run segmented reduce ANY to if any of the values are true Spark 3.x default is the 3VL logic: - if any element is true the array maps to true - if no element is true and there is at least one null, the array maps to null - if no element is true and none is null, the array maps to false Legacy mode 2VL: - if any element is true the array maps to true - if no element is true , the array maps to false Closes #4815 Signed-off-by: Gera Shegalov <gera@apache.org>
I wish we can support org.apache.spark.sql.catalyst.expressions.ArrayExists.
Mini repro:
Unsupported messages:
The text was updated successfully, but these errors were encountered: