-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: more joins for DataFrame.update #21855
Comments
I think this is a pretty solid proposal. That being said, given that this is a core part of |
we only join on a single axes at a time |
@jreback I don't think I follow completely. Of course But more to the point, both methods satisfy different needs, and
But this does not help us, as this does not give an answer whether the index should be
Actually, one can very easily emulate joins in two axes with
The only thing missing here would be the processing of the
Personally, I think this is much more readable than:
So, summing up, |
@jreback Any response to the above? |
@pandas-dev @jreback gentle ping :) |
I'm +1 on deprecating
(which as a bonus - and only if trivial implementation-wise - could accept an actual
Am I missing anything @h-vetinari ? |
Good idea! Certainly better than the
And even though The idea for the "bonus" is very similar to the Thinking a bit more about
Alternatively, |
@toobaz Any thoughts to the above, resp. preferences for the summary below (not counting deprecation cycles)? I'm torn between options 1. & 2.
|
Considerations for the API to allow different join-types for
So then, the question is which combination makes the most sense. Here the variants so far for
And the variants so far for
I think that - from the POV of consistency (and the idea of @toobaz to be able to pass indexes directly) - the most reasonable choice would be DF_1./S_1. In particular, then [edit 180823]: added some more options |
@toobaz If I may ask for your input to the above (since your last response), then I could formulate a cleaned-up proposal which can then be reviewed by everybody. |
@gfyoung @toobaz @jreback @TomAugspurger @jorisvandenbossche With |
@wesm https://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/core/frame.py#L5054 Open questions:
|
@h-vetinari : Sorry that we've been pretty dark on this. I think there is definitely interest, but time is certainly a factor here. @jreback @toobaz : thoughts? |
Still very busy, sorry. Will be back next week. |
I'm completely underwater until mid-November. If there is truly an impasse where I can help / weigh in, please let me know and I'll make the time |
@jreback @toobaz @jorisvandenbossche Friendly ping. :) |
Some random thoughts for now:
|
@toobaz |
This should be non-controversial, as even the source code for
DataFrame.update
literally says (https://github.com/pandas-dev/pandas/blob/v0.23.3/pandas/core/frame.py#L5054):# TODO: Support other joins
I tried to look if a previous issue for this exists, but did not find one.
Some thoughts that arise:
join
should clearly be'left'
df1.update(df2, join='right', overwrite=some_boolean)
would be the same asdf2.update(df1, join='left', overwrite=not some_boolean)
. IMO this is not a terrible redundancy, as it allows each user to choose the order that more easily fits their thought pattern.df1.combine_first(df2)
would be the same asdf1.update(df2, join='outer', overwrite=False)
, only thatcombine_first
has much fewer options and controls (i.e.filter_func
andraise_conflict
). Actually, I'd very much like to deprecatecombine_first
before pandas 1.0. Only difference is thatupdate
returns None, which should be changed as well IMO -- relevant xrefs: ENH: add inplace-kwarg to update #21858 DEPR: combine_first (replace with update(..., join='outer'); for both Series/DF) #21859axis=0|1|None
-keyword, like inDataFrame.align
. However, upon further investigation, I don't believe this to be a good choice, as anything other thanaxis=None
would implicitly have to choose ajoin
for the other axis to actually decide the index/columns of the result.list
andtuple
would be reasonable to allow as containers, but not more.The text was updated successfully, but these errors were encountered: