Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Regression of Index.join() when return_indexers=True #58603

Open
2 of 3 tasks
michaelpradel opened this issue May 6, 2024 · 2 comments
Open
2 of 3 tasks

BUG: Regression of Index.join() when return_indexers=True #58603

michaelpradel opened this issue May 6, 2024 · 2 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@michaelpradel
Copy link

michaelpradel commented May 6, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
index1 = pd.Index([1, 3, 4])
index2 = pd.Index([2, 2, 3])
joined_index, lidx, ridx = index1.join(index2, how='right', return_indexers=True)
print(joined_index, lidx, ridx)

Issue Description

Unlike Pandas 2.2.2, the current main branch doesn't always return the indexers. The above example prints this output (where the right indexer surprisingly is None):
Index([2, 2, 3], dtype='int64') [-1 -1 1] None

The change in behavior got introduced with #56841, which is supposed to improve performance, but not to change the behavior in other ways.

Expected Behavior

Pandas 2.2.2 prints this output, which is what I'd expect:
Index([2, 2, 3], dtype='int64') [-1 -1 1] [0 1 2]

Installed Versions

INSTALLED VERSIONS ------------------ commit : 9fee6cf python : 3.10.8.final.0 python-bits : 64 OS : Linux OS-release : 6.5.0-28-generic Version : #29-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 28 23:46:48 UTC 2024 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 3.0.0.dev0+850.g9fee6cfc59
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 63.2.0
pip : 24.0
Cython : 3.0.10
pytest : 8.2.0
hypothesis : 6.100.2
sphinx : 7.3.7
blosc : None
feather : None
xlsxwriter : 3.2.0
lxml.etree : 5.2.1
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.9
jinja2 : 3.1.3
IPython : 8.24.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.3.8
fastparquet : 2024.2.0
fsspec : 2024.3.1
gcsfs : 2024.3.1
matplotlib : 3.8.4
numba : 0.59.1
numexpr : 2.10.0
odfpy : None
openpyxl : 3.1.2
pyarrow : 16.0.0
pyreadstat : 1.2.7
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.3.1
scipy : 1.13.0
sqlalchemy : 2.0.29
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.3.0
xlrd : 2.0.1
zstandard : 0.22.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

@michaelpradel michaelpradel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 6, 2024
@rabelmervin
Copy link

Hi sir,I would like to solve this issue.could you please tell me the location of this file in the repository ?

@michaelpradel
Copy link
Author

Since the change in behavior got introduced with #56841, looking at the code modified by that PR is probably a good starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants