ERR: Raise ValueError when setting scalars in a dataframe with no index ( #16823) #16968

alanbato · 2017-07-15T22:30:07Z

closes ERR: setting a column with a scalar and no index should raise #16823
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Trying to set a column with an scalar value and no index now raises a ValueError, similar to the behaviour of the DataFrame constructor.

pep8speaks · 2017-07-15T22:30:11Z

Hello @alanbato! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 08, 2017 at 16:21 Hours UTC

jreback

lgtm. if you'd reverse the code as show. ping on green.

jreback · 2017-07-15T23:40:56Z

pandas/core/frame.py

-            self._data = self._data.reindex_axis(value.index.copy(), axis=1,
-                                                 fill_value=np.nan)
+        if not len(self.index):
+            if is_list_like(value):


I like this better if you reverse the tests, IOW do

if not is_like_like(value): raise ..... try: ....

alanbato · 2017-07-16T18:49:41Z

Do you know why the CI checks are failing?

jreback · 2017-07-16T19:05:38Z

u just click thru

https://travis-ci.org/pandas-dev/pandas/jobs/254061811

jreback · 2017-08-18T01:03:14Z

can you rebase

jreback · 2017-09-23T20:30:15Z

can you rebase. code looks ok .

alanbato · 2017-09-23T23:37:29Z

@jreback Sorry, I didn't see the previous comments. I rebased, but I still think some tests were broken due to this new error raising and not being caught in them. should I change these tests?

jreback · 2017-09-23T23:43:41Z

yes if we have tests doing this then they should be changed

alanbato · 2017-09-23T23:52:06Z

Okay, I will change the failing tests once I build the C extensions that for some reason stop working everytime I reboot 🤔 Thanks Jeff!

jreback · 2017-09-24T12:42:18Z

we have been merging a fair amount of extensions (c code) recently ; always a good practice to build extensions when u pull (and if nothing changes it doesn't do anythinh)

alanbato · 2017-09-24T18:08:59Z

The tests pass now, but I feel some previous behavior will need more suitable errors (like the crosstab one). Maybe make it a separate issue?

PD: Thanks, I thought it had to do with my machine. Will rebuild every pull from now on then

jreback

I think the crosstab case is wrong. that should work; pls investigate

jreback · 2017-09-24T18:47:01Z

pandas/tests/reshape/test_pivot.py

-
-        tm.assert_frame_equal(actual, expected)
+        # GH16823
+        # Setting a column with a scalar and no index should raise


somethings wrong here, this should not hit this path.

In case you can give me some more pointers, here's the pytest failure showing the path that it takes to the error

def test_crosstab_no_overlap(self): # GS 10291 s1 = pd.Series([1, 2, 3], index=[1, 2, 3]) s2 = pd.Series([4, 5, 6], index=[4, 5, 6]) > actual = crosstab(s1, s2) pandas/tests/reshape/test_pivot.py:1227: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/core/reshape/pivot.py:458: in crosstab df['__dummy__'] = 0 pandas/core/frame.py:2459: in __setitem__ self._set_item(key, value) pandas/core/frame.py:2529: in _set_item self._ensure_valid_index(value)```

I think it's because the intersection on columns and rows of both series is producing an empty dataframe (thus 0-len index) and trying to set df['__dummy__'] = 0 is the line at fault. Should the crosstab fn with two non-intersecting series work correctly, or should we catch that exception before?
If it is possible, how should the output look like? Maybe we can change the behavior so that it never has to work with an empty dataframe at all?

I don't think you need to set the __dummy__ unless len(df).

jreback · 2017-09-24T18:47:24Z

pandas/tests/indexing/test_partial.py

@@ -523,24 +523,16 @@ def f():
    def test_partial_set_empty_frame_row(self):
        # GH5720, GH5744
        # don't create rows when empty
-        expected = DataFrame(columns=['A', 'B', 'New'],
-                             index=pd.Index([], dtype='int64'))
-        expected['A'] = expected['A'].astype('int64')


ok this is fine here.

jreback · 2017-09-24T18:48:09Z

doc/source/whatsnew/v0.21.0.txt

@@ -481,6 +481,7 @@ Other API Changes
 - :class:`Period` is now immutable, and will now raise an ``AttributeError`` when a user tries to assign a new value to the ``ordinal`` or ``freq`` attributes (:issue:`17116`).
 - :func:`to_datetime` when passed a tz-aware ``origin=`` kwarg will now raise a more informative ``ValueError`` rather than a ``TypeError`` (:issue:`16842`)
 - Renamed non-functional ``index`` to ``index_col`` in :func:`read_stata` to improve API consistency (:issue:`16342`)
+- Setting on a column with a scalar value and no index now raises a ``ValueError`` (:issue:`16823`)


say 0-len index.

codecov · 2017-09-24T21:33:33Z

Codecov Report

Merging #16968 into master will decrease coverage by 0.01%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master   #16968      +/-   ##
==========================================
- Coverage   91.26%   91.24%   -0.02%     
==========================================
  Files         163      163              
  Lines       49978    49980       +2     
==========================================
- Hits        45611    45604       -7     
- Misses       4367     4376       +9

Flag	Coverage Δ
#multiple	`89.04% <83.33%> (ø)`	⬆️
#single	`40.24% <33.33%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/reshape/pivot.py	`96.38% <100%> (+0.02%)`	⬆️
pandas/core/frame.py	`97.74% <75%> (-0.1%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/groupby.py	`91.99% <0%> (-0.05%)`	⬇️
pandas/core/indexes/datetimes.py	`95.58% <0%> (ø)`	⬆️
pandas/core/indexes/datetimelike.py	`97.09% <0%> (+0.2%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e63c935...e86db7f. Read the comment docs.

alanbato · 2017-10-02T06:08:05Z

@jreback Ping! 🌵✔️

jreback · 2017-10-02T12:47:23Z

pandas/core/reshape/pivot.py

@@ -454,6 +454,8 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None,

    from pandas import DataFrame
    df = DataFrame(data, index=common_idx)
+    if not len(df):


rather than do this (which is incorrect because the result doesn't have the correct index).

just change next statement to

if len(df) and values is None: ....

I think I tried that and got an assertion error because the test was using a new empty dataframe as expected value, and the return value was an empty dataframe with a different index. I'll check again, and if it still breaks I'll change the expected value accordingly, assuming no other tests break.

Changing the statement to if len(df) and values is None: stills makes the crosstab raise the ValueError I'm implementing in this PR, because in this test len(df) is 0 and values IS None, so when it falls to:

else: df['__dummy__'] = values kwargs = {'aggfunc': aggfunc}

it's basically doing df['__dummy__'] = None which is what raises the error.

What I think would fix it and preserve the index would be this:

if not len(df): return df

see my comment below.

actually just return did here i think is ok (and u had that before)

change and ping on green

I'm sorry, I'm not sure I understand what you mean. Should I change it back to return an empty DataFrame?

no change back to

if not len(df) return df

you need to return the DataFrame with the correct index.

Ok, changed it back :)

jreback · 2017-10-02T12:48:06Z

pandas/core/reshape/pivot.py

@@ -454,6 +454,8 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None,

    from pandas import DataFrame
    df = DataFrame(data, index=common_idx)
+    if not len(df):
+        return DataFrame()
    if values is None:


pls add a test for this case. I don't think we have coverage as your change didn't break things (but should have). (I think this was a crosstabe of empty frames?)

It was a crosstab of non-overlaping frames, which was broken by this fix and now returns an empty frame (which was the expected value in that test case).
I'll add a test case with empty frames just to be sure.

great. the key is that the index IS preserved.

sorry, I mean that the test this was hitting was maybe not right, your change here should have broken it

alanbato · 2017-10-02T16:24:26Z

Rewrote the behavior on non-overlaping frames and redid the test to make it pass as it should.

jreback · 2017-10-02T16:26:22Z

lgtm. ping on green.

alanbato · 2017-10-02T20:42:35Z

@jreback Ping! 🌵 ☑️

TomAugspurger · 2017-10-05T19:58:04Z

@alanbato the CI tests failed. https://travis-ci.org/pandas-dev/pandas/jobs/283492196#L1377 looks relevant to your changes here.

alanbato · 2017-10-05T21:05:14Z

@TomAugspurger Yes, now I remember why I changed that return value, haha. But the output I'm getting from this non-overlapping crosstab is DataFrame(index=[], columns=['row_0','col_0']) and excuse me if I'm wrong but I'm not sure we want that output.
Correct me if I'm wrong, but wouldn't it make sense to return an empty dataframe when this happens?
We were trying to preserve the Index, but my understanding is that this non-overlapping chase always results in an Index: []

jreback · 2017-10-06T19:48:07Z

I pushed a fix here.

jreback · 2017-10-08T16:21:30Z

I reverted to the previous.

jreback · 2017-10-08T17:05:00Z

thanks @alanbato

alanbato · 2017-10-09T20:12:52Z

Thank you @jreback!

…ex ( pandas-dev#16823) (pandas-dev#16968)

…h no index ( pandas-dev#16823) (pandas-dev#16968)" This reverts commit f9ba6fe.

…h no index ( #16823) (#16968)" (#17902) * Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( #16823) (#16968)" This reverts commit f9ba6fe. * TST: expicit test on setting scalars on empty frame closes #17894

…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902) * Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)" This reverts commit f9ba6fe. * TST: expicit test on setting scalars on empty frame closes pandas-dev#17894

…ex ( pandas-dev#16823) (pandas-dev#16968)

…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902) * Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)" This reverts commit f9ba6fe. * TST: expicit test on setting scalars on empty frame closes pandas-dev#17894

…ex ( pandas-dev#16823) (pandas-dev#16968)

…h no index ( pandas-dev#16823) (pandas-dev#16968)" (pandas-dev#17902) * Revert "ERR: Raise ValueError when setting scalars in a dataframe with no index ( pandas-dev#16823) (pandas-dev#16968)" This reverts commit f9ba6fe. * TST: expicit test on setting scalars on empty frame closes pandas-dev#17894

jreback approved these changes Jul 15, 2017

View reviewed changes

jreback added Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 15, 2017

jreback added this to the 0.21.0 milestone Jul 15, 2017

jreback removed this from the 0.21.0 milestone Sep 23, 2017

alanbato force-pushed the fix_16823 branch from 7328307 to 91d1aa0 Compare September 23, 2017 23:35

alanbato force-pushed the fix_16823 branch from 91d1aa0 to 4012248 Compare September 24, 2017 18:07

jreback requested changes Sep 24, 2017

View reviewed changes

alanbato force-pushed the fix_16823 branch 2 times, most recently from 23a346f to afb79cd Compare October 1, 2017 05:23

jreback requested changes Oct 2, 2017

View reviewed changes

alanbato force-pushed the fix_16823 branch from afb79cd to add7189 Compare October 2, 2017 16:20

jreback approved these changes Oct 2, 2017

View reviewed changes

jreback added this to the 0.21.0 milestone Oct 2, 2017

alanbato and others added 3 commits October 6, 2017 11:42

Raise ValueError when settings scalars with 0-len index

ae8323b

Change back to returning a df with same index

70479bd

fix test

5d254aa

jreback force-pushed the fix_16823 branch from 24a423a to 5d254aa Compare October 6, 2017 19:47

revert previous

e86db7f

jreback merged commit f9ba6fe into pandas-dev:master Oct 8, 2017

alanbato deleted the fix_16823 branch October 9, 2017 20:12

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017

ERR: Raise ValueError when setting scalars in a dataframe with no ind…

16ee0a7

…ex ( pandas-dev#16823) (pandas-dev#16968)

jreback added a commit to jreback/pandas that referenced this pull request Oct 17, 2017

Revert "ERR: Raise ValueError when setting scalars in a dataframe wit…

c51064e

…h no index ( pandas-dev#16823) (pandas-dev#16968)" This reverts commit f9ba6fe.

alanbato added a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

ERR: Raise ValueError when setting scalars in a dataframe with no ind…

1c4d72a

…ex ( pandas-dev#16823) (pandas-dev#16968)

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

ERR: Raise ValueError when setting scalars in a dataframe with no ind…

12befd0

…ex ( pandas-dev#16823) (pandas-dev#16968)

ERR: Raise ValueError when setting scalars in a dataframe with no index ( #16823) #16968

ERR: Raise ValueError when setting scalars in a dataframe with no index ( #16823) #16968

Conversation

alanbato commented Jul 15, 2017

pep8speaks commented Jul 15, 2017 • edited Loading

Comment last updated on October 08, 2017 at 16:21 Hours UTC

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alanbato commented Jul 16, 2017

jreback commented Jul 16, 2017

jreback commented Aug 18, 2017

jreback commented Sep 23, 2017

alanbato commented Sep 23, 2017

jreback commented Sep 23, 2017

alanbato commented Sep 23, 2017

jreback commented Sep 24, 2017

alanbato commented Sep 24, 2017 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback Sep 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 24, 2017 • edited Loading

Codecov Report

alanbato commented Oct 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alanbato commented Oct 2, 2017

jreback commented Oct 2, 2017

alanbato commented Oct 2, 2017

TomAugspurger commented Oct 5, 2017

alanbato commented Oct 5, 2017 • edited Loading

jreback commented Oct 6, 2017

jreback commented Oct 8, 2017

jreback commented Oct 8, 2017

alanbato commented Oct 9, 2017

pep8speaks commented Jul 15, 2017 •

edited

Loading

alanbato commented Sep 24, 2017 •

edited

Loading

jreback Sep 25, 2017 •

edited

Loading

codecov bot commented Sep 24, 2017 •

edited

Loading

alanbato commented Oct 5, 2017 •

edited

Loading