-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cryptic DataFrame.agg error when using dictionaries #14421
Comments
pls show the error note we have no support for 3.6 atm (it may work but not tested at all) |
sorry, updated with the error message. i believe python 3.6.0 is a detail, i encountered this error on 3.5 too. |
yes 3.6 doesn't matter how is that cryptic? you get a KeyError |
you have to have a valid column name when aggregating in a dataframe and a dict of name to function |
So the name of the column in the dict has to be the name of a column in the Apologies if there is already an option to do so. |
a list works because it applies to all columns while a dict is selective at mapping column to function pretty straightforward / you are suggesting something even more confusing |
Sure. Until you try doing this func1 = lambda x: x.sum()/x.std()
func2 = lambda x: x.mean()/x.std()
(
pd.DataFrame({"u": [2,1,4,2,5], "v": [3,1,5,2,3], "a": ["a", "a", "b", "a", "b"]})
.groupby("a")
.agg([np.sum, np.mean, func1, func2])
) and you see it throwing this error: I don't think that aggregating by some random function and renaming the aggregated column is that weird ... |
What I mean is something like this (
pd.DataFrame({"u": [2,1,4,2,5], "v": [3,1,5,2,3], "a": ["a", "a", "b", "a", "b"]})
.groupby("a")
.agg2({"s": np.sum, "m": np.mean, "f1": func1, "f2": func2})
) Returning an object with (in this case) a multi-index on the column level, with s, m, f1 and f2 below each column – which would be exactly like |
pls read the docs you can do exactly that if u use a series groupby |
I know it works exactly this way with |
you can pass a nested dictionary |
Ok so this does exactly what I wanted: func1 = lambda x: x.sum()/x.std()
func2 = lambda x: x.mean()/x.std()
d = {"s": np.sum, "m": np.mean, "f1": func1, "f2": func2}
(
pd.DataFrame({"u": [2,1,4,2,5], "v": [3,1,5,2,3], "a": ["a", "a", "b", "a", "b"]})
.groupby("a")
.agg({"u": d, "v": d})
) Would you still think it's completely pointless to have something like |
Pointless / confusing |
too confusing to have another function which does exactly the same thing however the documentation could use an update on the nested dict case with a dataframe |
I can update the documentation. I didn't mean to necessarily create another function but to try and non-destructively add this option to |
@myyc I certainly understand the usecase, but the problem is basically that it is not possible to support this with the same API, as you wouldn't be able to make the distinction anymore between"do you want to apply this function to the specific column?" or "do you want to give this name to the result of this function?" I was just thinking. A possible way to be able to give a list of functions to be applied to all columns ánd being able to name the result would be to accept dicts of tuples in this list of functions. Currently you can pass a function or a string (for certain known functions), we could maybe extend this to dicts or typles like:
|
@jorisvandenbossche Yeah I think that would do the trick, although the syntax is a bit weird (e.g. what happens if you pass stuff like I understand it's impossible with the current system, but don't you think that my proposal (i.e. hypothetically changing |
Well yes, that would be a bit strange, as I think we should only accept dicts of length 1 (therefore tuples are maybe more natural, but the dicts are more like how it works for series) IMO, both usages of the dict are useful (eg myself, I think I more use the dict to specify different functions for different columns, certainly if you have different data types in different column that should be aggregated differently, this is very handy. For correct naming I just use a |
Not sure if this is a bug. This works:
while this returns an error:
error: KeyError: 'blah'
The text was updated successfully, but these errors were encountered: