-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Right/outer merge behaviour on left column and right index is unexpected #17257
Comments
@cornucrapia : Thanks for reporting this! Could you instead print out the |
@gfyoung absolutely. See below. repr(merged_right)
Out[3]: ' other year a b\n123 a 2000 2 6\n124 b 2000 2 6\n125 c 2000 2 6\n126 d 2001 3 7\n127 e 2002 4 8\n128 f 2002 4 8\n129 g 2002 4 8\n130 NaN 2003 1 5'
repr(merged_outer)
Out[4]: ' other year a b\n123 a 2000 2.0 6.0\n124 b 2000 2.0 6.0\n125 c 2000 2.0 6.0\n126 d 2001 3.0 7.0\n127 e 2002 4.0 8.0\n128 f 2002 4.0 8.0\n129 g 2002 4.0 8.0\n130 h 2004 NaN NaN\n130 NaN 2003 1.0 5.0' |
@cornucrapia : Sorry, I misspoke 😄. Can you just print out the In[5]: merged_right
Out[5]: <DataFrame> |
merged_right
Out[5]:
other year a b
123 a 2000 2 6
124 b 2000 2 6
125 c 2000 2 6
126 d 2001 3 7
127 e 2002 4 8
128 f 2002 4 8
129 g 2002 4 8
130 NaN 2003 1 5
merged_outer
Out[6]:
other year a b
123 a 2000 2.0 6.0
124 b 2000 2.0 6.0
125 c 2000 2.0 6.0
126 d 2001 3.0 7.0
127 e 2002 4.0 8.0
128 f 2002 4.0 8.0
129 g 2002 4.0 8.0
130 h 2004 NaN NaN
130 NaN 2003 1.0 5.0 |
Ah, that's a little better to read now! I do agree: I too am a little perplexed why there would be such a row, as I don't know where that would show up in either such join. |
I believe this is the same example import pandas as pd
left = pd.DataFrame({'a': [1, 2, 3], 'key': [0, 1, 1]})
right = pd.DataFrame({'b': [1, 2, 3]})
result = left.merge(right, left_on='key', right_index=True, how='right') output In [5]: left
Out[5]:
a key
0 1 0
1 2 1
2 3 1
In [6]: right
Out[6]:
b
0 1
1 2
2 3
In [7]: result
Out[7]:
a key b
0 1.0 0 1
1 2.0 1 2
2 3.0 1 2
2 NaN 2 3 I may have expected something like In [11]: pd.merge(left, right.rename_axis('key').reset_index(), on='key', how='right').set_index('key', drop=False)
Out[11]:
a key b
key
0 1.0 0 1
1 2.0 1 2
1 3.0 1 2
2 NaN 2 3 https://github.com/pandas-dev/pandas/pull/24904/files is adding a test with that. I'll add a note to that test that it's probably buggy, and to refer here. |
@TomAugspurger I would expect the output
for the operation
Could you give me a hint, why you would expect a 2 in the |
Is that from the index in |
I would have expected, that the key column is the original key column, not influenced by the index. But I am not sure if this is the desired behavior. I would expected index = index of right df, a and key cols with values from left df and b with values from right df. I am currently working on the index issue, which is unrelated with the key column problem. But I would try to work on this after figuring the index issue out, if I know how this should work. |
I thought |
Yes you join the column |
Ah right. Mine is probably mistaken. |
Ok thanks. I have a small follow up question about the initial example. I would expect that the index there is the index of the right |
Code Sample, a copy-pastable example if possible
Problem description
merged outer and merged right both return year 2003, a 1, b 5 associated with index 130.
Expected Output
Either a NaN index entry or an error/warning. There's no obvious reason to associate that data with 130 index
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: