-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refactor udf/udaf/udwf ReturnType #8183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @JasonLi-cn - this is a great start. THe complex_udf
is especially nice as an example
CC @2010YOUY01 as I think you have been thinking about this area too
constant_args: &[ConstantArg], | ||
) -> Result<Arc<DataType>>; | ||
} | ||
|
||
/// Factory that returns the functions's return type given the input argument types | ||
pub type ReturnTypeFunction = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you considered changing the ReturnTypeFunction
to something more like
Arc<dyn Fn(&[Expr], &dyn ExprSchemable) -> Result<Arc<DataType>> + Send + Sync>;
Now that we have made the fields of ScalarUDF
non pub (in #8079) I think we have much greater leeway to improve the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could introduce the FunctionImplementation
trait as proposed here: https://github.com/apache/arrow-datafusion/pull/8046/files#diff-8a327db2db945bcf6ca2b4229885532feae127e94a450600d3fac6ecdc0eeb3fR141
with the more general return types. Something like this perhaps:
/// Convenience trait for implementing ScalarUDF. See [`ScalarUDF::new_from_impl()`]
pub trait FunctionImplementation {
/// Returns this function's name
fn name(&self) -> &str;
/// Returns this function's signature
fn signature(&self) -> &Signature;
/// return the return type of this function given the types of the arguments
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;
/// return the return type of this function given the actual arguments. This is
/// used to implement functions where the return type is a function of the actual
/// arguments
fn return_type_from_args(&self, args: &[Expr], schemabe: &dyn ExprSchemable) -> Result<DataType> {
// default impl would call `Self::return_type`
todo!()
}
/// Invoke the function on `args`, returning the appropriate result
fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue>;
}
Another idea @JasonLi-cn might be to pursue this idea #8051 to "specialize" the function based on its arguments (and in this case the specialization could potentially also include the constants 🤔 ) |
marking as draft as I think @JasonLi-cn is working on feedback now |
Ok, I will try to complete this task |
FOr anyone following along, this was implemented in a different way, see #8624 |
Which issue does this PR close?
Closes #8182
Rationale for this change
In some cases, I need to determine the output type of the function based on the input arguments of the function, but I cannot do this at present. This is because the ReturnTypeFunction provides only the data types of the input parameters.
What changes are included in this PR?
ReturnTypeFactory
traitReturnTypeFunction
implReturnTypeFactory
return_type
fromReturnTypeFunction
intoReturnTypeFactory
Are these changes tested?
Are there any user-facing changes?