Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved UDF/UDAF interfaces #2163

Closed
miguno opened this issue Nov 16, 2018 · 11 comments
Closed

Improved UDF/UDAF interfaces #2163

miguno opened this issue Nov 16, 2018 · 11 comments
Assignees
Labels
enhancement user-defined-functions Tickets about UDF, UDAF, UDTF

Comments

@miguno
Copy link
Contributor

miguno commented Nov 16, 2018

Background

KSQL supports UDFs and UDAFs today. However, there are certain limitations that limit their usefulness in practice, especially for UDAFs. This ticket is about addressing the most commonly requested limitations.

UDF limitations

Related:

UDAF limitations

For UDAFs in particular are currently very limited, and we should consider prioritizing work on better UDAFs over UDFs.

  • A UDAF accepts only a single input parameter.
    • Add the ability to use generic or wildcard type parameters and return types.
    • Allow for varargs inputs (possibly less urgent for UDAF than for UDF)
  • You can only store a single field in-between UDAF invocations, and that field must also have the same data type as the final result. This makes it impractical to implement even a simple AVERAGE function because one cannot store both running_total and record_count values.
@miguno
Copy link
Contributor Author

miguno commented Nov 16, 2018

cc @blueedgenick

@miguno miguno added the user-defined-functions Tickets about UDF, UDAF, UDTF label Nov 16, 2018
@ankitchiplunkar
Copy link

Would really appreciate increased functionality in UDAF's to perform basic operations such as mean, standard deviation etc.

In comparison here is a sample UDAF of Spark to calculate the geometric mean. https://docs.databricks.com/spark/latest/spark-sql/udaf-scala.html It has two extra features which enable calculation of mean/standard_deviation for aggregates

  1. bufferSchema: Used to store multiple fields during UDAF invocation
  2. evaluate() : Method to finally convert the bufferSchema to return value.

@miguno
Copy link
Contributor Author

miguno commented Nov 21, 2018

Thanks for the feedback @ankitchiplunkar !

If you don't mind, could you also give your upvote to this feature request by giving a 👍 reaction to the first message in this thread? This allows us to automatically track number of upvotes per feature.

In addition to what you said above, which specific UDAFs are you interested in? Beyond allowing users to provide their own (and making it easier to implement these, which is what this feature request covers), I am asking because some commonly requested UDAFs we can also support out of the box.

@ankitchiplunkar
Copy link

@miguno we wanted to perform basic stats on a grouped data stream and std deviation is the first use case. We have just open-sourced a UDF which performs basic math operations and inturn enables to calculate the standard deviation.
Would appreciate your inputs
https://github.com/tokenanalyst/ksql-math-udf
https://twitter.com/thetokenanalyst/status/1065597747993165824

@miguno
Copy link
Contributor Author

miguno commented Nov 23, 2018

@ankitchiplunkar: Feel free to send us a PR to contribute any such UDFs to this repo, so they are included out of the box.

@ankitchiplunkar
Copy link

@miguno for sure, would create a PR containing these UDF's

@miguno miguno changed the title Improved UDF/UDAF functionality Improved UDF/UDAF interfaces Dec 5, 2018
@agavra agavra self-assigned this Jan 18, 2019
@blueedgenick
Copy link
Contributor

for the sake of being explicit, we should call out here that UDAFs would also greatly benefit from some of the capabilities called out for UDFs in the issue description above.
Specifically:

  • the ability to use generic or wildcard type parameters and return types
  • varargs inputs (though this one might be less urgent for UDAF than for UDF)

@blueedgenick
Copy link
Contributor

one more that we somehow missed off the original list:

  • make ProcessorContext available inside of an UDF implementation

@peoplemerge
Copy link

+1

1 similar comment
@kewaltyagi
Copy link

+1

@agavra
Copy link
Contributor

agavra commented Oct 29, 2019

🎉 I believe these are all addressed in the upcoming release of KSQL, and more! Thanks to a huge team effort with @vpapavas and @purplefox contributing some features around this as well.

I'm going to close out this ticket as it's been mostly addressed, but feel free to open any specific more targeted feature requests/bug reports if you encounter them!

@agavra agavra closed this as completed Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement user-defined-functions Tickets about UDF, UDAF, UDTF
Projects
None yet
Development

No branches or pull requests

6 participants