-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Index with right_index=True or left_index=True merge #34468
Conversation
Change merge Change test_merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll also want to add a whatsnew note
def test_right_index_true_right_join_target_index(): | ||
df_left = pd.DataFrame(index=["a", "b"]) | ||
df_right = pd.DataFrame({"x": ["a", "c"]}) | ||
|
||
result = pd.merge(df_left, df_right, left_index=True, right_on="x", how="left") | ||
expected = pd.DataFrame({"x": ["a", "b"]}, index=Index(["a", "b"])) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "left" instead of "right" in the test name? I'd also include a link to the issue as a comment, and rename df_left -> left, df_right -> right to be consistent with other tests.
How does this behave when we actually are doing a right join? It feels like it should exactly equal the right DataFrame (index and values), but someone who knows more than me would have to chime in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are right of course. It was late yesterday evening :)
I added a test to show the behavior in case of a right join. I am not quite sure what to expect. The current behavior is the result you described. But the documentation says, that indexes are passed along if used in join. I am not sure how to interpret this in case of right join when left index is used.
I would like to wait until the solution is final. There are still a few open points in case of outer joins if my change should be the future behavior for left and right joins I am not sure how to create the target index for an outer join whe the indexes are used in the join condition. My interpretation would be to merge them in case of equal names and create a MultiIndex in case of different names. |
@phofl if you merge master will have a look |
…243index � Conflicts: � pandas/tests/reshape/merge/test_merge.py
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@phofl if you can merge master will have a look |
closing as stale. if you want to continue, pls ping and can re-open. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Hi,
I would expect the output form the merge function as specified in #28243, i.e. if you join right with right_index=True the resulting DataFrame should have the right index.
The documentation says, that the index is passed on if it is part of the merge.
This change broke a few test, which expected in this case that the index was the index which was merged onto the base DataFrame. I fixed them temporarily until I know how to proceed in this case.
Should this as implemented be the desired behavior for the right_index=True case?
The code is not final, it is just a first draft. I would have to change the tests for good after we decide how to implement this.