-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add DataFrame.assign method #9239
Conversation
""" | ||
|
||
""" | ||
data = self.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should do a shallow copy if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we never shallow copy
much more trouble than it's worth
only indexes are shallow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the unintentional formatting on that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahha
the old iphone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstand pandas's datamodel, presuming that you could do a shallow copy and then update a column without modifying the original (like a dict). I see now that I was mistaken :(.
I like only minus against transform is the use in groupby is somewhat different. |
I like
|
If you intend to keep the function pure (as it is currently) then the term Neither |
FWIW, dlpyr's mutate is also pure, despite the misleading name. |
le sigh |
Augment? Enhance? I'll keep thinking.
|
I think we should go a slightly different route here. I would change the signature of I would vote to effectively deprecate the original
|
That sounds pretty clean. |
I wanted to say that |
+1 for update. I still vote for allowing multiple variables at once :). |
OK, another idea: df.assign? Update is not very useful in pandas, but it is part of the standard mapping API, where it's known as a method that does an in-place operation. |
Thanks for the input. My favorite is I'll also allow multiple variables I think. I'm going to do all of the calculations before the assignment so that we don't run into issues with one calculation depending on another, and having the success or failure of the call dependent upon the dict ordering. More tests and docs coming soon. |
3b40b32
to
24a055f
Compare
Updated with some docs and handling multiple assigns. I wasn't sure about the best place for the docs so I threw it in I am worried about people hitting subtle bugs with assigning multiple columns in one I've got a few more things to clean up and then I'll ping for review. |
24a055f
to
d70bf60
Compare
Ok, ready for feedback. Just a summary,
|
Nice feature, and some points to be considered:
|
|
I really think this should be called |
@jreback I'm all for deprecating .update, but I think it is clearer to give them a distinct name, given that the behavior is different and also different from the update method on dicts. Another name possibly worth considering is |
@shoyer we already have way too many update/set/filter etc methods. I think some consolidation is in order. I don't think adding another new method is useful at all. And you dont' actually need to deprecate the current functionaility of |
@jreback I agree on the problem but not the solution (in this specific case). Taking the deprecation and eventual removal of the current meaning of update as a given, I would rather have an assign method than an update method, because the later will always have the confusing association with duct.update (the issue is similar to the name mutate but worse). But is mostly bike shedding. |
@jreback To follow up on your edit, I am pretty opposed to functions that do very different things depending on how you call them, except as a transitional step. That is poor API design :). |
@shoyer fair enough, and I DO like |
Travis is running. Will merge in a few hours, assuming no objections. |
iris = read_csv('data/iris.data') | ||
iris.head() | ||
|
||
(iris.assign(sepal_ratio = iris['SepalWidth'] / iris['SepalLength']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the parens here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I call .head
on the next line.
@TomAugspurger lgtm
|
e9266f5
to
7137212
Compare
@jreback I'll need to think about how this interacts with groupby. E.G. say we wanted the group-wise mean of the ratio:
Obviously not really what we want; assign returns the entire frame. But In [10]: df.assign(r=lambda x: x.sepal_length / x.sepal_width).groupby('species').r.mean()
Out[10]:
species
setosa 1.470188
versicolor 2.160402
virginica 2.230453
Name: r, dtype: float64 I suppose it'd be needed for operations where the |
💣s away? |
matplotlib.style.use('ggplot') | ||
except AttributeError: | ||
options.display.mpl_style = 'default' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these import still needed? As you don't seem to use a plotting example anymore now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed now. Forgot to remove them.
I added two small doc remarks, for the rest, bombs away! |
Creates a new method for DataFrame, based off dplyr's mutate. Closes pandas-dev#9229
7137212
to
6a5bd89
Compare
API: Add DataFrame.assign method
Ok, thanks everyone. We can do follow-ups as needed. |
Woohoo! Well done 👍 |
very nice @TomAugspurger small issue: http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html the plot seems 'too big' size-wise. is this controlled somewhere? |
You can control this size in the |
Thanks. I'll follow up tonight.
|
@TomAugspurger I remember you did a PR for the size of the plot in the whatsnew: http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html. Did it get merged? |
Yeah: #9575 Still having problems? I can change the actual image size of that pull didn't work.
|
hmm still seems much larger than the rest of the page to me
|
Yes, that didn't seem to work for some reason. |
I opened up #9619 to fix this. |
Closes #9229
signature:
DataFrame.transform(**kwargs)
My question now is
transformed
DataFrame? Should we do any kind of checking on the index or columns?