You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I test it locally that after avoiding the re-computation, at least this query gets super fast as the main branch does.
Describe the solution you'd like
We can cache the result for the same arguments and schema pair, but we need to check the matching every time too.
We can add another argument args_types: &[DataType] for return_type_from_exprs. If the args_types is empty, we calculate the types, otherwise we skip it.
We can compute the return type along with udf creation 🤔
We have schema for ScalarFunction::new_udf, it is possible
It looks like 3 is better if the schema is usually fixed, otherwise 2. 1 if we need to re-compute args_types in many place, but it doesn't seem to be true for now.
Describe alternatives you've considered
Other better design
Additional context
To reproduce, you can play around on commit 970dd7d
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge?
In #9504, we have an issue that
select array_ndims([[[[[[[[[[[[[[[[[[[[[1]]]]]]]]]]]]]]]]]]]]])
is taking so long time compare with the main branch.I found out that we always calculate the arguments type every time, which is really expensive for the expression like above.
https://github.com/apache/arrow-datafusion/blob/5537572820977b38719e2253f601a159deef5bc6/datafusion/expr/src/expr_schema.rs#L141-L156
UDF now always go through
return_type_from_exprs
https://github.com/apache/arrow-datafusion/blob/5537572820977b38719e2253f601a159deef5bc6/datafusion/expr/src/udf.rs#L306-L316
Always recompute types here.
I test it locally that after avoiding the re-computation, at least this query gets super fast as the main branch does.
Describe the solution you'd like
args_types: &[DataType]
forreturn_type_from_exprs
. If theargs_types
is empty, we calculate the types, otherwise we skip it.https://github.com/apache/arrow-datafusion/blob/5537572820977b38719e2253f601a159deef5bc6/datafusion/sql/src/expr/function.rs#L69-L72
We have schema for
ScalarFunction::new_udf
, it is possibleIt looks like 3 is better if the schema is usually fixed, otherwise 2. 1 if we need to re-compute args_types in many place, but it doesn't seem to be true for now.
Describe alternatives you've considered
Other better design
Additional context
To reproduce, you can play around on commit 970dd7d
The text was updated successfully, but these errors were encountered: