Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor udf/udaf/udwf ReturnType #8183

Closed
wants to merge 3 commits into from

Conversation

JasonLi-cn
Copy link
Contributor

Which issue does this PR close?

Closes #8182

Rationale for this change

In some cases, I need to determine the output type of the function based on the input arguments of the function, but I cannot do this at present. This is because the ReturnTypeFunction provides only the data types of the input parameters.

What changes are included in this PR?

  • Add ReturnTypeFactory trait
  • ReturnTypeFunction impl ReturnTypeFactory
  • Change UDF, UDAF, UDWF's return_type from ReturnTypeFunction into ReturnTypeFactory

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions labels Nov 15, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @JasonLi-cn - this is a great start. THe complex_udf is especially nice as an example

CC @2010YOUY01 as I think you have been thinking about this area too

constant_args: &[ConstantArg],
) -> Result<Arc<DataType>>;
}

/// Factory that returns the functions's return type given the input argument types
pub type ReturnTypeFunction =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you considered changing the ReturnTypeFunction to something more like

    Arc<dyn Fn(&[Expr], &dyn ExprSchemable) -> Result<Arc<DataType>> + Send + Sync>;

Now that we have made the fields of ScalarUDF non pub (in #8079) I think we have much greater leeway to improve the API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could introduce the FunctionImplementation trait as proposed here: https://github.com/apache/arrow-datafusion/pull/8046/files#diff-8a327db2db945bcf6ca2b4229885532feae127e94a450600d3fac6ecdc0eeb3fR141

with the more general return types. Something like this perhaps:

/// Convenience trait for implementing ScalarUDF. See [`ScalarUDF::new_from_impl()`]
pub trait FunctionImplementation {
    /// Returns this function's name
    fn name(&self) -> &str;

    /// Returns this function's signature
    fn signature(&self) -> &Signature;

    /// return the return type of this function given the types of the arguments
    fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;

    /// return the return type of this function given the actual arguments. This is
    /// used to implement functions where the return type is a function of the actual
    /// arguments
    fn return_type_from_args(&self, args: &[Expr], schemabe: &dyn ExprSchemable) -> Result<DataType> {
      // default impl would call `Self::return_type`
      todo!()
    }


    /// Invoke the function on `args`, returning the appropriate result
    fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue>;
}

@alamb
Copy link
Contributor

alamb commented Nov 15, 2023

Another idea @JasonLi-cn might be to pursue this idea #8051 to "specialize" the function based on its arguments (and in this case the specialization could potentially also include the constants 🤔 )

@alamb alamb marked this pull request as draft November 17, 2023 19:08
@alamb
Copy link
Contributor

alamb commented Nov 17, 2023

marking as draft as I think @JasonLi-cn is working on feedback now

@JasonLi-cn
Copy link
Contributor Author

marking as draft as I think @JasonLi-cn is working on feedback now

Ok, I will try to complete this task

@JasonLi-cn JasonLi-cn closed this Apr 12, 2024
@alamb
Copy link
Contributor

alamb commented Apr 12, 2024

FOr anyone following along, this was implemented in a different way, see #8624

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UDF/UDAF/UDWF: refactor ReturnType
2 participants