-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: SeriesGroupBy._aggregate_named #41090
Comments
Tracking down all the places where we do this pinning of "name", disabling them all breaks 16 tests xref #15062 "I also think that the .name arg is somewhat undocumented when passed to .apply, if you want to add a note there (or small example), might be useful." |
Discussed in April dev meeting, decided this was probably too-breaking |
@rhshadrach id like to revive this and am hoping to get you on board. all the places in the groupby code where we do |
I agree with this in its entirety. I personally haven't found a use for the group keys, but don't have a sense how utilized this is. Without this, currently I think users would be left with iterating a groupby object themselves and combining the results (essentially reimplementating apply / agg / transform). While I'm hesitant to add arguments to apply / agg / transform, it seems to me having a |
I'd be OK with telling users to do this; it's not as if our implementation does anything fancy for performance. And we do have an uncomfortable amount of guessing when it comes time to wrap/concat the results.
I'd prefer this to the status quo, but share your hesitancy to add keywords. A separate method might be even cleaner? Maybe we just deprecate this behavior entirely (how?) and wait to add an alternative until someone complains? It's ugly as sin, but we could introspect a UDF for keywords. |
We have 6 places in the groupby code where we do this. Of those, disabling it in DFGB._transform_general (twice) and in filter does not break any tests. The filter docstring does mention this pinning, so that would need a deprecation despite not being tested. The usage in _aggregate_named will be easy to deprecate bc _aggregate_named is only called from one place and that is only after the non-pinning variant raises. The usage in SGB._transform_general is needed for test_transform_lambda_with_datetimetz, which actually looks like a pretty legit use case for something like this. |
With the exception introspecting, I'd be good with any of those options; and I even could probably be dragged into being okay with introspection. A separate method seems a little odd to me because it'd be the same implementation except what is passed to the UDF. I'm good with deprecating and seeing if users raise any issues. I suspect this is a feature that users aren't aware of let alone use, but could very much be wrong here. |
OK let's start with just deprecating and go from there. |
For the places where we do pinning other than _aggregate_named, we could do the deprecation by adding a _pin_name keyword and doing something like:
and put the |
Dangit. Looks like GroupBy.hist relies on the name being pinned in apply_groupwise, but doesn't raise if that pinning is disabled, just gives silently incorrect results (e.g. test_groupby_hist_series_with_legend) |
Looks like #25457 is an example of a bug caused by this name-pinning |
AFAICT this exists to allow users to apply a different function to each group. I could be convinced that this is worth supporting, but it shouldn't be shoehorned into
SeriesGroupBy.aggregate
the way it is.AFAICT this is not documented, supports only one test (parametrized over 4 dtypes):
The text was updated successfully, but these errors were encountered: