Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: .iloc and .loc behavior not consistent on empty dataframe #9983

Merged
merged 1 commit into from
Apr 30, 2015

Conversation

artemyk
Copy link
Contributor

@artemyk artemyk commented Apr 25, 2015

Fixes #9964 .

Notice that assert_frame_equal now fails for empty dataframes with different dtypes (as, I think, it should). However, this means some tests need to be patched now.

@artemyk artemyk force-pushed the loc_fix branch 4 times, most recently from 1089142 to 06dd4d8 Compare April 26, 2015 00:19
@artemyk artemyk closed this Apr 26, 2015
@artemyk artemyk reopened this Apr 26, 2015
@hayd
Copy link
Contributor

hayd commented Apr 26, 2015

What other changes is this making? (there are a lot of edge cases to empty frames!)

For instance the tests in test_groupby looked correct before...

@artemyk
Copy link
Contributor Author

artemyk commented Apr 26, 2015

The problem is that due to the fact that .iloc looses datatypes on an empty dataframe, dtypes are not checked on empty dataframes. E.g., right now on master:

In [1]: import pandas as pd

In [2]: df1=pd.DataFrame(columns=["col1","col2"])

In [3]: df1["col1"] = df1["col1"].astype('int64')

In [4]: df2=pd.DataFrame(columns=["col1","col2"])

In [5]: print df1.dtypes
col1     int64
col2    object
dtype: object

In [6]: print df2.dtypes
col1    object
col2    object
dtype: object

In [7]: pd.util.testing.assert_frame_equal(df1, df2, check_dtype=True)

In [8]: 

Where on PR this assert fails. So a couple of tests that compare empty dataframes with different dtypes need to be patched.

@@ -324,12 +324,12 @@ def test_frame_to_json_except(self):
def test_frame_empty(self):
df = DataFrame(columns=['jim', 'joe'])
self.assertFalse(df._is_mixed_type)
assert_frame_equal(read_json(df.to_json()), df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this seems bad that this test passes at all. df.to_json() == {} here!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe df.to_json() == '{"jim":{},"joe":{}}'. It is checking that the number / names of columns is correct.

@hayd
Copy link
Contributor

hayd commented Apr 26, 2015

Please test the corrected behaviour of assert_frame_equal in test_testing.

IIUC This means previously fleshed out behaviour for empty edge case dtypes has been broken for some time?

@artemyk
Copy link
Contributor Author

artemyk commented Apr 26, 2015

OK, added test to test_testing.

Yes, I think assert_frame_equal was broken for empty dataframes. It would still compare number/names of columns, but not their dtypes.

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Apr 28, 2015
@@ -4422,6 +4436,14 @@ def test_indexing_assignment_dict_already_exists(self):
expected.loc[5] = [9, 99]
tm.assert_frame_equal(df, expected)

def test_indexing_dtypes_on_empty(self):
df = DataFrame({'a':[1,2,3],'b':['b','b2','b3']})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a comment with the issue number

@jreback jreback added this to the 0.17.0 milestone Apr 28, 2015
@artemyk artemyk force-pushed the loc_fix branch 2 times, most recently from 6fe6eae to 0245bfd Compare April 28, 2015 01:12
@artemyk
Copy link
Contributor Author

artemyk commented Apr 28, 2015

@jreback Made changes, except for moving test out of test_testing --- are you sure about that?
Also, came across another weird bug --- see comments above.

@jreback jreback modified the milestones: 0.16.1, 0.17.0 Apr 28, 2015
df1["col1"] = df1["col1"].astype('int64')
df2=pd.DataFrame(columns=["col1","col2"])
self._assert_equal(df1, df2, check_dtype=False)
self._assert_not_equal(df1, df2, check_dtype=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok then. I would actually use assert_frame_equal then, otherwise you maybe subtely testing something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's in class TestAssertFrameEqual --- all the tests are clearly for testing assert_frame_equal .

@artemyk
Copy link
Contributor Author

artemyk commented Apr 29, 2015

@jreback OK, fixed test_frame.py

@jreback
Copy link
Contributor

jreback commented Apr 29, 2015

@artemyk ok looks good. ping on green (if you want to rebase on master, prob a release note conflict, ok, otherwise will do it on merging).

Tests

Fix

Test reorder

Doc update

Tests fix

Tests fix

SQL tests fix

Testing update

Fixes

Testing fix

Test fix
@artemyk
Copy link
Contributor Author

artemyk commented Apr 30, 2015

@jreback Ready to go.

jreback added a commit that referenced this pull request Apr 30, 2015
BUG: .iloc and .loc behavior not consistent on empty dataframe
@jreback jreback merged commit 28b1488 into pandas-dev:master Apr 30, 2015
@jreback
Copy link
Contributor

jreback commented Apr 30, 2015

@artemyk thanks!

proof that sometimes trivial looking things (in indexing), actually can have lots of cases / need lots of testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.loc and .iloc returns different dtypes on empty dataframe
3 participants