Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: Raise ValueError when setting scalars in a dataframe with no index ( #16823) #16968

Merged
merged 4 commits into from
Oct 8, 2017

Conversation

alanbato
Copy link
Contributor

Trying to set a column with an scalar value and no index now raises a ValueError, similar to the behaviour of the DataFrame constructor.

@pep8speaks
Copy link

pep8speaks commented Jul 15, 2017

Hello @alanbato! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 08, 2017 at 16:21 Hours UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. if you'd reverse the code as show. ping on green.

self._data = self._data.reindex_axis(value.index.copy(), axis=1,
fill_value=np.nan)
if not len(self.index):
if is_list_like(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this better if you reverse the tests, IOW do

if not is_like_like(value):
     raise .....

try:
    ....

@jreback jreback added Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 15, 2017
@jreback jreback added this to the 0.21.0 milestone Jul 15, 2017
@alanbato
Copy link
Contributor Author

Do you know why the CI checks are failing?

@jreback
Copy link
Contributor

jreback commented Jul 16, 2017

@jreback
Copy link
Contributor

jreback commented Aug 18, 2017

can you rebase

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

can you rebase. code looks ok .

@alanbato
Copy link
Contributor Author

@jreback Sorry, I didn't see the previous comments. I rebased, but I still think some tests were broken due to this new error raising and not being caught in them. should I change these tests?

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

yes if we have tests doing this then they should be changed

@alanbato
Copy link
Contributor Author

Okay, I will change the failing tests once I build the C extensions that for some reason stop working everytime I reboot 🤔 Thanks Jeff!

@jreback
Copy link
Contributor

jreback commented Sep 24, 2017

we have been merging a fair amount of extensions (c code) recently ; always a good practice to build extensions when u pull (and if nothing changes it doesn't do anythinh)

@alanbato
Copy link
Contributor Author

alanbato commented Sep 24, 2017

The tests pass now, but I feel some previous behavior will need more suitable errors (like the crosstab one). Maybe make it a separate issue?

PD: Thanks, I thought it had to do with my machine. Will rebuild every pull from now on then

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the crosstab case is wrong. that should work; pls investigate


tm.assert_frame_equal(actual, expected)
# GH16823
# Setting a column with a scalar and no index should raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somethings wrong here, this should not hit this path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you can give me some more pointers, here's the pytest failure showing the path that it takes to the error

def test_crosstab_no_overlap(self):
        # GS 10291
    
        s1 = pd.Series([1, 2, 3], index=[1, 2, 3])
        s2 = pd.Series([4, 5, 6], index=[4, 5, 6])
    
>       actual = crosstab(s1, s2)

pandas/tests/reshape/test_pivot.py:1227: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/core/reshape/pivot.py:458: in crosstab
    df['__dummy__'] = 0
pandas/core/frame.py:2459: in __setitem__
    self._set_item(key, value)
pandas/core/frame.py:2529: in _set_item
    self._ensure_valid_index(value)```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because the intersection on columns and rows of both series is producing an empty dataframe (thus 0-len index) and trying to set df['__dummy__'] = 0 is the line at fault. Should the crosstab fn with two non-intersecting series work correctly, or should we catch that exception before?
If it is possible, how should the output look like? Maybe we can change the behavior so that it never has to work with an empty dataframe at all?

Copy link
Contributor

@jreback jreback Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to set the __dummy__ unless len(df).

@@ -523,24 +523,16 @@ def f():
def test_partial_set_empty_frame_row(self):
# GH5720, GH5744
# don't create rows when empty
expected = DataFrame(columns=['A', 'B', 'New'],
index=pd.Index([], dtype='int64'))
expected['A'] = expected['A'].astype('int64')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is fine here.

@@ -481,6 +481,7 @@ Other API Changes
- :class:`Period` is now immutable, and will now raise an ``AttributeError`` when a user tries to assign a new value to the ``ordinal`` or ``freq`` attributes (:issue:`17116`).
- :func:`to_datetime` when passed a tz-aware ``origin=`` kwarg will now raise a more informative ``ValueError`` rather than a ``TypeError`` (:issue:`16842`)
- Renamed non-functional ``index`` to ``index_col`` in :func:`read_stata` to improve API consistency (:issue:`16342`)
- Setting on a column with a scalar value and no index now raises a ``ValueError`` (:issue:`16823`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say 0-len index.

@codecov
Copy link

codecov bot commented Sep 24, 2017

Codecov Report

Merging #16968 into master will decrease coverage by 0.01%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16968      +/-   ##
==========================================
- Coverage   91.26%   91.24%   -0.02%     
==========================================
  Files         163      163              
  Lines       49978    49980       +2     
==========================================
- Hits        45611    45604       -7     
- Misses       4367     4376       +9
Flag Coverage Δ
#multiple 89.04% <83.33%> (ø) ⬆️
#single 40.24% <33.33%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/pivot.py 96.38% <100%> (+0.02%) ⬆️
pandas/core/frame.py 97.74% <75%> (-0.1%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/groupby.py 91.99% <0%> (-0.05%) ⬇️
pandas/core/indexes/datetimes.py 95.58% <0%> (ø) ⬆️
pandas/core/indexes/datetimelike.py 97.09% <0%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e63c935...e86db7f. Read the comment docs.

@alanbato alanbato force-pushed the fix_16823 branch 2 times, most recently from 23a346f to afb79cd Compare October 1, 2017 05:23
@alanbato
Copy link
Contributor Author

alanbato commented Oct 2, 2017

@jreback Ping! 🌵✔️

@@ -454,6 +454,8 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None,

from pandas import DataFrame
df = DataFrame(data, index=common_idx)
if not len(df):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than do this (which is incorrect because the result doesn't have the correct index).

just change next statement to

if len(df) and values is None:
....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I tried that and got an assertion error because the test was using a new empty dataframe as expected value, and the return value was an empty dataframe with a different index. I'll check again, and if it still breaks I'll change the expected value accordingly, assuming no other tests break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the statement to if len(df) and values is None: stills makes the crosstab raise the ValueError I'm implementing in this PR, because in this test len(df) is 0 and values IS None, so when it falls to:

else:
        df['__dummy__'] = values
        kwargs = {'aggfunc': aggfunc}

it's basically doing df['__dummy__'] = None which is what raises the error.

What I think would fix it and preserve the index would be this:

if not len(df):
    return df

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually just return did here i think is ok (and u had that before)

change and ping on green

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I'm not sure I understand what you mean. Should I change it back to return an empty DataFrame?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no change back to

if not len(df)
    return df

you need to return the DataFrame with the correct index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed it back :)

@@ -454,6 +454,8 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None,

from pandas import DataFrame
df = DataFrame(data, index=common_idx)
if not len(df):
return DataFrame()
if values is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a test for this case. I don't think we have coverage as your change didn't break things (but should have). (I think this was a crosstabe of empty frames?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a crosstab of non-overlaping frames, which was broken by this fix and now returns an empty frame (which was the expected value in that test case).
I'll add a test case with empty frames just to be sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great. the key is that the index IS preserved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I mean that the test this was hitting was maybe not right, your change here should have broken it

@alanbato
Copy link
Contributor Author

alanbato commented Oct 2, 2017

Rewrote the behavior on non-overlaping frames and redid the test to make it pass as it should.

@jreback
Copy link
Contributor

jreback commented Oct 2, 2017

lgtm. ping on green.

@jreback jreback added this to the 0.21.0 milestone Oct 2, 2017
@alanbato
Copy link
Contributor Author

alanbato commented Oct 2, 2017

@jreback Ping! 🌵 ☑️

@TomAugspurger
Copy link
Contributor

@alanbato the CI tests failed. https://travis-ci.org/pandas-dev/pandas/jobs/283492196#L1377 looks relevant to your changes here.

@alanbato
Copy link
Contributor Author

alanbato commented Oct 5, 2017

@TomAugspurger Yes, now I remember why I changed that return value, haha. But the output I'm getting from this non-overlapping crosstab is DataFrame(index=[], columns=['row_0','col_0']) and excuse me if I'm wrong but I'm not sure we want that output.
Correct me if I'm wrong, but wouldn't it make sense to return an empty dataframe when this happens?
We were trying to preserve the Index, but my understanding is that this non-overlapping chase always results in an Index: []

@jreback
Copy link
Contributor

jreback commented Oct 6, 2017

I pushed a fix here.

@jreback
Copy link
Contributor

jreback commented Oct 8, 2017

I reverted to the previous.

@jreback jreback merged commit f9ba6fe into pandas-dev:master Oct 8, 2017
@jreback
Copy link
Contributor

jreback commented Oct 8, 2017

thanks @alanbato

@alanbato alanbato deleted the fix_16823 branch October 9, 2017 20:12
@alanbato
Copy link
Contributor Author

alanbato commented Oct 9, 2017

Thank you @jreback!

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017
jreback added a commit to jreback/pandas that referenced this pull request Oct 17, 2017
jreback added a commit that referenced this pull request Oct 18, 2017
…h no index ( #16823) (#16968)" (#17902)

* Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( #16823) (#16968)"

This reverts commit f9ba6fe.

* TST: expicit test on setting scalars on empty frame

closes #17894
yeemey pushed a commit to yeemey/pandas that referenced this pull request Oct 20, 2017
…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902)

* Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)"

This reverts commit f9ba6fe.

* TST: expicit test on setting scalars on empty frame

closes pandas-dev#17894
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902)

* Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)"

This reverts commit f9ba6fe.

* TST: expicit test on setting scalars on empty frame

closes pandas-dev#17894
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902)

* Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)"

This reverts commit f9ba6fe.

* TST: expicit test on setting scalars on empty frame

closes pandas-dev#17894
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ERR: setting a column with a scalar and no index should raise
4 participants