-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERR: Raise ValueError when setting scalars in a dataframe with no index ( #16823) #16968
Conversation
Hello @alanbato! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on October 08, 2017 at 16:21 Hours UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. if you'd reverse the code as show. ping on green.
pandas/core/frame.py
Outdated
self._data = self._data.reindex_axis(value.index.copy(), axis=1, | ||
fill_value=np.nan) | ||
if not len(self.index): | ||
if is_list_like(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this better if you reverse the tests, IOW do
if not is_like_like(value):
raise .....
try:
....
Do you know why the CI checks are failing? |
u just click thru |
can you rebase |
can you rebase. code looks ok . |
@jreback Sorry, I didn't see the previous comments. I rebased, but I still think some tests were broken due to this new error raising and not being caught in them. should I change these tests? |
yes if we have tests doing this then they should be changed |
Okay, I will change the failing tests once I build the C extensions that for some reason stop working everytime I reboot 🤔 Thanks Jeff! |
we have been merging a fair amount of extensions (c code) recently ; always a good practice to build extensions when u pull (and if nothing changes it doesn't do anythinh) |
The tests pass now, but I feel some previous behavior will need more suitable errors (like the PD: Thanks, I thought it had to do with my machine. Will rebuild every pull from now on then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the crosstab case is wrong. that should work; pls investigate
pandas/tests/reshape/test_pivot.py
Outdated
|
||
tm.assert_frame_equal(actual, expected) | ||
# GH16823 | ||
# Setting a column with a scalar and no index should raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somethings wrong here, this should not hit this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you can give me some more pointers, here's the pytest failure showing the path that it takes to the error
def test_crosstab_no_overlap(self):
# GS 10291
s1 = pd.Series([1, 2, 3], index=[1, 2, 3])
s2 = pd.Series([4, 5, 6], index=[4, 5, 6])
> actual = crosstab(s1, s2)
pandas/tests/reshape/test_pivot.py:1227:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/reshape/pivot.py:458: in crosstab
df['__dummy__'] = 0
pandas/core/frame.py:2459: in __setitem__
self._set_item(key, value)
pandas/core/frame.py:2529: in _set_item
self._ensure_valid_index(value)```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because the intersection on columns and rows of both series is producing an empty dataframe (thus 0-len index) and trying to set df['__dummy__'] = 0
is the line at fault. Should the crosstab fn with two non-intersecting series work correctly, or should we catch that exception before?
If it is possible, how should the output look like? Maybe we can change the behavior so that it never has to work with an empty dataframe at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to set the __dummy__
unless len(df)
.
@@ -523,24 +523,16 @@ def f(): | |||
def test_partial_set_empty_frame_row(self): | |||
# GH5720, GH5744 | |||
# don't create rows when empty | |||
expected = DataFrame(columns=['A', 'B', 'New'], | |||
index=pd.Index([], dtype='int64')) | |||
expected['A'] = expected['A'].astype('int64') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is fine here.
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -481,6 +481,7 @@ Other API Changes | |||
- :class:`Period` is now immutable, and will now raise an ``AttributeError`` when a user tries to assign a new value to the ``ordinal`` or ``freq`` attributes (:issue:`17116`). | |||
- :func:`to_datetime` when passed a tz-aware ``origin=`` kwarg will now raise a more informative ``ValueError`` rather than a ``TypeError`` (:issue:`16842`) | |||
- Renamed non-functional ``index`` to ``index_col`` in :func:`read_stata` to improve API consistency (:issue:`16342`) | |||
- Setting on a column with a scalar value and no index now raises a ``ValueError`` (:issue:`16823`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say 0-len index.
Codecov Report
@@ Coverage Diff @@
## master #16968 +/- ##
==========================================
- Coverage 91.26% 91.24% -0.02%
==========================================
Files 163 163
Lines 49978 49980 +2
==========================================
- Hits 45611 45604 -7
- Misses 4367 4376 +9
Continue to review full report at Codecov.
|
23a346f
to
afb79cd
Compare
@jreback Ping! 🌵✔️ |
@@ -454,6 +454,8 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None, | |||
|
|||
from pandas import DataFrame | |||
df = DataFrame(data, index=common_idx) | |||
if not len(df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than do this (which is incorrect because the result doesn't have the correct index).
just change next statement to
if len(df) and values is None:
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I tried that and got an assertion error because the test was using a new empty dataframe as expected value, and the return value was an empty dataframe with a different index. I'll check again, and if it still breaks I'll change the expected value accordingly, assuming no other tests break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the statement to if len(df) and values is None:
stills makes the crosstab raise the ValueError I'm implementing in this PR, because in this test len(df) is 0 and values IS None, so when it falls to:
else:
df['__dummy__'] = values
kwargs = {'aggfunc': aggfunc}
it's basically doing df['__dummy__'] = None
which is what raises the error.
What I think would fix it and preserve the index would be this:
if not len(df):
return df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually just return did here i think is ok (and u had that before)
change and ping on green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I'm not sure I understand what you mean. Should I change it back to return an empty DataFrame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no change back to
if not len(df)
return df
you need to return the DataFrame with the correct index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, changed it back :)
pandas/core/reshape/pivot.py
Outdated
@@ -454,6 +454,8 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None, | |||
|
|||
from pandas import DataFrame | |||
df = DataFrame(data, index=common_idx) | |||
if not len(df): | |||
return DataFrame() | |||
if values is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a test for this case. I don't think we have coverage as your change didn't break things (but should have). (I think this was a crosstabe of empty frames?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a crosstab of non-overlaping frames, which was broken by this fix and now returns an empty frame (which was the expected value in that test case).
I'll add a test case with empty frames just to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great. the key is that the index IS preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I mean that the test this was hitting was maybe not right, your change here should have broken it
Rewrote the behavior on non-overlaping frames and redid the test to make it pass as it should. |
lgtm. ping on green. |
@jreback Ping! 🌵 ☑️ |
@alanbato the CI tests failed. https://travis-ci.org/pandas-dev/pandas/jobs/283492196#L1377 looks relevant to your changes here. |
@TomAugspurger Yes, now I remember why I changed that return value, haha. But the output I'm getting from this non-overlapping crosstab is |
I pushed a fix here. |
I reverted to the previous. |
thanks @alanbato |
Thank you @jreback! |
…h no index ( pandas-dev#16823) (pandas-dev#16968)" This reverts commit f9ba6fe.
…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902) * Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)" This reverts commit f9ba6fe. * TST: expicit test on setting scalars on empty frame closes pandas-dev#17894
…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902) * Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)" This reverts commit f9ba6fe. * TST: expicit test on setting scalars on empty frame closes pandas-dev#17894
…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902) * Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)" This reverts commit f9ba6fe. * TST: expicit test on setting scalars on empty frame closes pandas-dev#17894
git diff upstream/master -u -- "*.py" | flake8 --diff
Trying to set a column with an scalar value and no index now raises a ValueError, similar to the behaviour of the DataFrame constructor.