-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: join_axes-kwarg in pd.concat #22318
Conversation
Hello @h-vetinari! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-07-03 02:21:51 UTC |
5df8f17
to
6047a8c
Compare
Codecov Report
@@ Coverage Diff @@
## master #22318 +/- ##
===========================================
+ Coverage 41.96% 92.23% +50.26%
===========================================
Files 180 161 -19
Lines 50860 51333 +473
===========================================
+ Hits 21345 47348 +26003
+ Misses 29515 3985 -25530
Continue to review full report at Codecov.
|
091ee11
to
dbb489d
Compare
pandas/core/frame.py
Outdated
@@ -6614,11 +6614,11 @@ def _join_compat(self, other, on=None, how='left', lsuffix='', rsuffix='', | |||
if can_concat: | |||
if how == 'left': | |||
how = 'outer' | |||
join_axes = [self.index] | |||
return concat(frames, axis=1, join=how, | |||
verify_integrity=True).reindex(self.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add copy=False
to the .reindex
pandas/core/generic.py
Outdated
@@ -8848,7 +8848,7 @@ def describe_1d(data): | |||
if name not in names: | |||
names.append(name) | |||
|
|||
d = pd.concat(ldesc, join_axes=pd.Index([names]), axis=1) | |||
d = pd.concat([x.reindex(names) for x in ldesc], axis=1, sort=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add copy=False
pandas/core/groupby/generic.py
Outdated
@@ -518,8 +518,10 @@ def _transform_general(self, func, *args, **kwargs): | |||
applied.append(res) | |||
|
|||
concat_index = obj.columns if self.axis == 0 else obj.index | |||
concatenated = concat(applied, join_axes=[concat_index], | |||
axis=self.axis, verify_integrity=False) | |||
other_axis = (self.axis + 1) % 2 # switches from 0 to 1 or from 1 to 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use this, just use an if else
dbb489d
to
f7902e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, lgtm. ping on green.
pandas/core/reshape/concat.py
Outdated
if ndim == 2: | ||
other_axis = 1 if axis == 0 else 0 # switches between 0 & 1 | ||
res = res.reindex(join_axes[0], axis=other_axis) | ||
else: # 3 for Panel; Panel4D already deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to mention Panel4D.
use elif ndim == 3 to be explicit
@jorisvandenbossche you had some comments on the issue. I am all for making things simpler. |
@jorisvandenbossche
Fair enough, that was a misunderstanding on my part. But the dtype-question is a side-issue here, IMO
I posed this question in the OP ("Only question is if performance would be much worse, if concatenating huge Series/DFs before selecting small index-subset."), and it's a valid point. The user could apply a reindex directly to the arguments of
In my opinion, the There might be a good solution for replacing the
Then index the non-concatenation index of the result could be set as desired (essentially replacing
There's not much on SO - https://www.google.com/search?q=stackoverflow+pandas+concat+join_axes yields only one (sorta) relevant hit: https://stackoverflow.com/q/27391081 Neither did I find much of substance here: https://github.com/search?p=2&q=join_axes&type=Issues |
0bdbdde
to
b0133b0
Compare
All green (rebased due to persistent test failures). |
Green and all feedback incorporated. Anything missing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments. ping on green.
pandas/core/generic.py
Outdated
@@ -8930,7 +8930,8 @@ def describe_1d(data): | |||
if name not in names: | |||
names.append(name) | |||
|
|||
d = pd.concat(ldesc, join_axes=pd.Index([names]), axis=1) | |||
d = pd.concat([x.reindex(names) for x in ldesc], axis=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the copy=False
on . the .reindex
. remove copy= from the pd.concat
it is not respected misleading here.
pandas/core/groupby/generic.py
Outdated
concatenated = concat(applied, join_axes=[concat_index], | ||
axis=self.axis, verify_integrity=False) | ||
other_axis = 1 if self.axis == 0 else 0 # switches between 0 & 1 | ||
concatenated = concat(applied, axis=self.axis, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove copy= from the concat
pandas/core/reshape/concat.py
Outdated
"length {length}".format(length=ndim - 1)) | ||
if ndim == 2: | ||
other_axis = 1 if axis == 0 else 0 # switches between 0 & 1 | ||
res = res.reindex(join_axes[0], axis=other_axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add copy=False to the .reindex
144ef86
to
99f029d
Compare
I answered above (#22318 (comment)) to your remarks/questions in the issue. Any further comments, API- or otherwise? |
ping on green |
@jorisvandenbossche i am ok with this |
@h-vetinari sorry for my slow follow-up, didn't have much time the last weeks I also saw that the discussion here (about an alternative way to do it) is related to the joining discussions in other PRs/issues (you linked #21855). I will try to take a look at those discussions one of the coming days. I personally think we should first decide on an alternative (or decide upon if we want one) before actually deprecating this, but no strong opinion. BTW, if we keep the PR as is, the documentation certainly needs an update (there are example of |
Any luck with that? As a potential shortcut to reading that other discussion, you could have a look at the pseudo-implementation I wrote above in #22318 (comment). Barring a decision on what to replace the API with, which tutorial docs do you mean? I didn't find anything in https://pandas.pydata.org/pandas-docs/stable/tutorials.html, but there's a bit in |
looks ok to me, @jorisvandenbossche has some questions. |
|
@jorisvandenbossche |
Could you retrigger that timed-out travis job please? ;-) |
All the CI builds had passed, except for codecov (I imagine because it had forgotten the coverage of the commit where this PR had originally branched off)
Fair enough... |
in regards to: #22318 (comment) this is a separate and distinct change. let's treat it as such. Keeping PRs nice and simple is the way to go. |
Fair point, but without someone doing a PR, we'll have just a deprecation, without the replacement. I think it's equally fair to say that that replacement is part of the deprecation itself (to uphold existing usability where necessary). I'll be gone the next two weeks., but then, this PR is not in a hurry... |
@h-vetinari IMHO we don't need a replace at all. |
In any event, if you can respond to the remaining comments (soon) we can merge this. |
This PR needs updating in several different ways (also review from @jorisvandenbossche), and as I mentioned I'm currently away for two weeks. I'd like to clean this up when I come back (realising that it'll be after 0.25), and properly replace the removed functionality with something like |
@h-vetinari that is a separate PR that may not be accepted |
What's the status here? Is this happening for 0.25.0? |
prob needs a small amount of fixup; let me see if I can do this in the next couple of hours. |
Thanks (I'm roughly pushing for an RC tomorrow, maybe cutting it tomorrow
night)
…On Tue, Jul 2, 2019 at 3:51 PM Jeff Reback ***@***.***> wrote:
prob needs a small amount of fixup; let me see if I can do this in the
next couple of hours.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#22318?email_source=notifications&email_token=AAKAOIRJEKLBDH5UXDTEQZLP5O5UVA5CNFSM4FPMKEI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZCQJ3Q#issuecomment-507839726>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIV7NS2GUR5QXI4DVQTP5O5UVANCNFSM4FPMKEIQ>
.
|
I wouldn't merge this before 0.25. It's been open for almost a year, so it's obviously not urgent. And I disagree that the direct replacement for the functionality we're deprecating should be a separate PR. If this gets merged before 0.25 there won't be replacement for that version (except the workaround of using OTOH, with a direct replacement in the same PR as the deprecation, one could formulate a clear path for updating. |
thanks @h-vetinari again, 1 PR 1 thing. The reason things take so long (not in this case), but generally is that they are too many things in 1 place. |
I have to say I agree with @h-vetinari that if we want something that replaces this functionality, we should not include this in 0.25.0 (unless we get the replacement in 0.25.0 as well). It is quite annoying to already start warning for something before a replacement is available (again, if we want to add such a replacement). |
there already is a very good alternative, |
It was from a discussion between the both of us that I commented #22318 (comment) with the idea of a But still, it is something to discuss. There is also the idea of introducing a |
I mean maybe, but I'd like to see an actual usecase. Removing API is easy, adding new should have a high bar. |
@jreback |
git diff upstream/master -u -- "*.py" | flake8 --diff