-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Join result takes the index order of the other (right) DataFrame instead of the calling's (left) one #15582
Comments
hmm, I think this needs a suite of tests. There are very few for |
@jreback noob here, where to write these tests? edit: found /tests , I guess I'm just looking for guidance ? |
|
@jreback OK cool, thanks for the quick reply |
@jreback So if I just swap And then I guess you want tests written for join to test the sort param? |
i think this might break a large amount of tests |
Old:
New:
my
They all fail with |
note that this is only going to matter for `how='inner' In any event need tests for all of these. |
The bug is not at the DataFrame level but at the Index level. The swap of arguments in the call to merge proposed by @sleepdeprivation is going to generate a new bug with swapped lsuffix and rsuffix. Moreover the order of the columns in the result is also exchanged (first columns from the right DataFrame, then columns from the left one). The real bug is at the Index level, concretely in the Index.intersection() method. The get_indexer must be called on the right index and with left index as argument, so that it returns a mask which transforms the caller index (right) into the passed index (left), and not the other way around. This is the desired output: a mask on the right index which picks its elements in such a way that it gets aligned with the left index. I have implemented/updated some tests in my PR. |
One question about the tests: I have implemented/updated some tests in files tests/indexes/test_base.py and tests/frame/test_misc_api.py (test_join_index), but as you said there are also tests on join in file tests/tools/test_join.py. I do not understand the logic in the directory structure of the tests. |
I think it should be self-evident. you have tests on the indexes themselves, and the api on frames (which only is a dispatch from |
Issue pandas-dev#15582: fix the sort=True bug in DataFrame.join
Issue pandas-dev#15582: fix the sort=True bug in DataFrame.join
closes pandas-dev#15582 Author: Albert Villanova del Moral <albert.villanova@gmail.com> Author: Jeff Reback <jeff@reback.net> Closes pandas-dev#15583 from albertvillanova/fix-15582 and squashes the following commits: 2d4e143 [Albert Villanova del Moral] Fix pytest fixture name collision 64e86a4 [Albert Villanova del Moral] Fix test on right join 73df69e [Albert Villanova del Moral] Address requested changes 8d2e9cc [Albert Villanova del Moral] Address requested changes 968c7f1 [Jeff Reback] DOC/TST: change to use parameterization 9e39794 [Albert Villanova del Moral] Address requested changes 5bf1508 [Albert Villanova del Moral] Address requested changes 654288b [Albert Villanova del Moral] Fix Travis errors 33eb740 [Albert Villanova del Moral] Address requested changes 3c200fe [Albert Villanova del Moral] Add new tests ef2581e [Albert Villanova del Moral] Fix Travis error f0d9d03 [Albert Villanova del Moral] Add whatsnew c96306d [Albert Villanova del Moral] Add sort argument to Index.join 047b513 [Albert Villanova del Moral] Address requested changes ec836bd [Albert Villanova del Moral] Fix Travis errors b977278 [Albert Villanova del Moral] Address requested changes 784fe75 [Albert Villanova del Moral] Fix error: line too long 1197b99 [Albert Villanova del Moral] Fix DataFrame column order when read from HDF file d9e29f8 [Albert Villanova del Moral] Create new DatetimeIndex from the Index.intersection result e7bcd28 [Albert Villanova del Moral] Fix typo in documentation a4ead99 [Albert Villanova del Moral] Fix typo c2a8dc3 [Albert Villanova del Moral] Implement tests c12bb3f [Albert Villanova del Moral] BUG: Fix index order for Index.intersection()
Code Sample, a copy-pastable example if possible
Problem description
Contrary to what is stated in the documentation of DataFrame.join(), when using the default sort=False, the return DataFrame preserves the index order of the other (right) DataFrame, instead of the index order of the calling (left) DataFrame.
Besides, the sort=True argument does not work.
Expected Output
The expected output is that the return DataFrame should preserve the index order of the calling (left) DataFrame.
Output of
pd.show_versions()
pandas: 0.19.2
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0.post20161106
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: