-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Epic] A collection of issues for extending the Aggregation function #12254
Comments
I wonder if we should consider where to draw the line on what aggregate functions to include in the core (i.e. should we include all these new functions?) Now that all aggregate functions use the same API, we could potentially keep more specialized functions such as listed here outside the ore -- either in its own crate or even own repo -- and then have other code integrate it in -- e.g. #11979 |
I started a discussion about if we should be adding all these functions directly in the core here: #12357 |
I like this idea! 🚀 |
@Weijun-H and @dmitrybugakov and @dharanad -- what do you think about creating a It would be a pretty neat way to help build out the function library in DataFUsion and would show off its extensibility I could then try an integrate it into Originally from: #12476 (comment) |
Thank you @alamb for proposing this initiative. I like this idea. What about others' thought? |
@alamb |
I do not have a strong preference -- I think it likely depends on the usecase:
However, let's not let get too carried away with details at the moment. I created https://github.com/datafusion-contrib/datafusion-functions-extra and added @dmitrybugakov and @austin362667 as admins. If anyone else wants to help let me know and we can add you too. @dmitrybugakov or @austin362667 would you be willing to setup the basic skeleton of the repo? Perhaps you could follow the model of https://github.com/datafusion-contrib/datafusion-functions-json for readme and registration function And then try to put the |
Update: I also added @Lordworms (a longtime DataFusion contributor) per #12284 (comment) |
I agree that 'datafusion-functions-extra' is sufficient for the current scenario because our goal is to decouple the extra functions from the core and make them lightweight. Additionally, it would be easier to split 'datafusion-functions-extra' into many specific use cases if necessary. |
I believe the consensus is that we are going to implement these functions in https://github.com/datafusion-contrib/datafusion-functions-extra -- I will close the related tickets so we are not confusing people about what we are looking for |
I filed #12625 to propose moving |
Is your feature request related to a problem or challenge?
DataFusion now supports several aggregation functions, but it still lacks some common ones that are essential for a broader range of data processing tasks. To make DataFusion more versatile and capable of handling diverse workloads, it should include additional aggregation functions commonly used in data analysis, such as mode and max_by.
Describe the solution you'd like
max_by
in Aggregation function #12252min_by
in Aggregation function #12253kurtosis_pop
in Aggregation function #12251kurtosis(x)
in Aggregation function #12250skewness(x)
in Aggregation function #12249mode
in Aggregation function #12248entropy
in Aggregation function #12247Describe alternatives you've considered
No response
Additional context
These functions should be implemented in datafusion-contrib/datafusion-functions-extra, instead of the core
The text was updated successfully, but these errors were encountered: