Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move datafusion_array_function specific rewrite rules like to datafusion_functions_array crate #9519

Closed
Tracked by #9285
alamb opened this issue Mar 9, 2024 · 2 comments · Fixed by #9583
Closed
Tracked by #9285
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Mar 9, 2024

Is your feature request related to a problem or challenge?

As we port over functions out of the datafusion-physical-expr crate, we have discovered several parts of ExprSimplif that rely on semantics of the array functions themselves

For example, the rewrite from array1 @> array2 to array_has_all(array1, array2)in rewrite_array_has_all_operator_to_func`
([source link](https://github.com/apache/arrow-datafusion/blob/5537572820977b38719e2253f601a159deef5bc6/datafusion/optimizer/src/analyzer/rewrite_expr.rs#L137

As @jayzhan22 noted on #9496 #9496 (comment), this rewrite is not correct if users supply their own implementation of array_has_all as that implementation may not be equal to DataFusion's array_has_all function. Thus the rule only makes sense when using the datafusion implementation

Describe the solution you'd like

Move the array_function specific rules such as rewrite_array_has_all_operator_to_func to the dataufusion_array_functions

Describe alternatives you've considered

  1. Create a a User Defined Analyzer pass (defined in datafusion_functions_array) with the desired rewrite
  2. Register the datafusion_function_array specific rewrite when the array functions in datafusion_array_function are registered (somewhere near here

Note you can register an analyzer rule via SessionState::add_analyzer_rule

Additional context

No response

@jayzhan211
Copy link
Contributor

Should we change add_analyzer_rule to a map structure to avoid duplication similar to register_udf?

@alamb
Copy link
Contributor Author

alamb commented Mar 11, 2024

Should we change add_analyzer_rule to a map structure to avoid duplication similar to register_udf?

I think the order of the rules is important. And some optimizer rules now are run multiple times

However, we may need to add some new way to add analyzer rules somewhere other than after all the other ("push_front?) maybe 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants