Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Index with right_index=True or left_index=True merge #34468

Closed
wants to merge 10 commits into from

Conversation

phofl
Copy link
Member

@phofl phofl commented May 29, 2020

Hi,

I would expect the output form the merge function as specified in #28243, i.e. if you join right with right_index=True the resulting DataFrame should have the right index.

The documentation says, that the index is passed on if it is part of the merge.

This change broke a few test, which expected in this case that the index was the index which was merged onto the base DataFrame. I fixed them temporarily until I know how to proceed in this case.

Should this as implemented be the desired behavior for the right_index=True case?

The code is not final, it is just a first draft. I would have to change the tests for good after we decide how to implement this.

Change merge

Change test_merge
Copy link
Member

@dsaxton dsaxton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also want to add a whatsnew note

Comment on lines 2245 to 2251
def test_right_index_true_right_join_target_index():
df_left = pd.DataFrame(index=["a", "b"])
df_right = pd.DataFrame({"x": ["a", "c"]})

result = pd.merge(df_left, df_right, left_index=True, right_on="x", how="left")
expected = pd.DataFrame({"x": ["a", "b"]}, index=Index(["a", "b"]))
tm.assert_frame_equal(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "left" instead of "right" in the test name? I'd also include a link to the issue as a comment, and rename df_left -> left, df_right -> right to be consistent with other tests.

How does this behave when we actually are doing a right join? It feels like it should exactly equal the right DataFrame (index and values), but someone who knows more than me would have to chime in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are right of course. It was late yesterday evening :)

I added a test to show the behavior in case of a right join. I am not quite sure what to expect. The current behavior is the result you described. But the documentation says, that indexes are passed along if used in join. I am not sure how to interpret this in case of right join when left index is used.

pandas/core/reshape/merge.py Outdated Show resolved Hide resolved
@phofl
Copy link
Member Author

phofl commented May 30, 2020

You'll also want to add a whatsnew note

I would like to wait until the solution is final. There are still a few open points in case of outer joins if my change should be the future behavior for left and right joins

I am not sure how to create the target index for an outer join whe the indexes are used in the join condition. My interpretation would be to merge them in case of equal names and create a MultiIndex in case of different names.

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Review labels Nov 26, 2020
@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

@phofl if you merge master will have a look

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 27, 2020
@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

@phofl if you can merge master will have a look

@jreback
Copy link
Contributor

jreback commented Feb 11, 2021

closing as stale. if you want to continue, pls ping and can re-open.

@jreback jreback closed this Feb 11, 2021
@phofl phofl deleted the 28243index branch April 26, 2021 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants