REGR: Fix interpolation on empty dataframe #35543

sanderland · 2020-08-04T12:51:33Z

Interpolation on an empty dataframe broke in 1.1 due to a change in how 'all columns are objects' is checked (specifically all(empty set) is True, while before dtype count object = None was checked against size = 0).
This is a complex function and I'm not sure what the proper fix is, suggesting to keep the empty check out of the rest of the logic.

Example code that broke:

import pandas as pd
df = pd.DataFrame([1,2])
df[[]].interpolate(limit_area='inside')

closes BUG: interpolate gives a TypeError on empty dataframes #35598
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Interpolation on an empty dataframe broke in 1.1 due to a change in how 'all columns are objects' is checked (specifically all(empty set) is True, while before dtype count object = None was checked against size = 0). This is a complex function and I'm not sure what the proper fix is, suggesting to keep the empty check out of the rest of the logic.

pep8speaks · 2020-08-04T12:51:38Z

Hello @sanderland! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-07 19:37:15 UTC

WillAyd · 2020-08-04T15:02:03Z

Can you add a test case for this?

sanderland · 2020-08-05T07:58:17Z

Can you add a test case for this?

done

jreback

i guess this was a regression, can you add a note in 1.1.1

pandas/tests/frame/methods/test_interpolate.py

jreback · 2020-08-06T22:22:33Z

pandas/core/generic.py

@@ -6799,6 +6799,9 @@ def interpolate(

        obj = self.T if should_transpose else self

+        if obj.empty:


would prefer that this happens in the internal method itself (called on L6861)

The error happens before this call (on line 6825)

There were a few changes to interpolate in 1.1.0. will run git bisect to ascertain why this is now failing.

There was a subtle change in #34752 where

if self.ndim == 2 and np.all(self.dtypes == np.dtype(object)):

was changed to

if obj.ndim == 2 and np.all(obj.dtypes == np.dtype(object)):

does reverting this change restore the old behaviour.

the regression is from #33084, but since the error message is about DataFrame columns, I don't think the above should have been changed either. @jbrockmendel

would prefer that this happens in the internal method itself (called on L6861)

As @sanderland notes, the check for all object dtype indeed happens here before calling into the internal method, so the check for empty thus also needs to happen here

However, @sanderland I think the df.empty check is not fully correct. Because this was also give True if you have columns but no rows. And for example an empty dataframe with a datetime64[ns] column will give a timedelta64[ns] column as result. That's something we should keep.

So I think we should explicitly check for no columns / no rows (alternatively could also add the check to the offending if obj.ndim == 2 and np.all(obj.dtypes == np.dtype(object)):, eg .. and obj.shape[0])

@sanderland can you try @jorisvandenbossche suggestion here

However, @sanderland I think the df.empty check is not fully correct. Because this was also give True if you have columns but no rows. And for example an empty dataframe with a datetime64[ns] column will give a timedelta64[ns] column as result. That's something we should keep.

I tried this in 1.0.x and it does not return a timedelta dtype, I dont see why interpolate would do this either. Could you suggest a test which fails with my approach but passes in 1.0.x?

Hmm, you're fully correct. I might have been mixing up the review of two PRs, as I was also reviewing a regression for diff() and maybe was testing that method here as well .. ;) (since df.diff() for datetime64 gives timedelta64, also for empty dataframe)

Forget my comment!

doc/source/whatsnew/v1.1.1.rst

Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>

jorisvandenbossche · 2020-08-17T09:37:13Z

@sanderland can you merge master once more to see if that fixes CI?

simonjayhawkins · 2020-08-17T09:42:55Z

test failure was

=========================== short test summary info ===========================
FAILED pandas/tests/computation/test_eval.py::TestAlignment::test_basic_series_frame_alignment[numexpr-python]
= 1 failed, 72824 passed, 1746 skipped, 1055 xfailed, 58 warnings in 1102.57s (0:18:22) =

probably unrelated.

restarted (azure will patch against master)

simonjayhawkins · 2020-08-17T10:23:00Z

Thanks @sanderland

simonjayhawkins · 2020-08-17T10:24:05Z

@meeseeksdev backport 1.1.x

jreback

@simonjayhawkins in the future pls don't merge these so fast

this is not what we wnat here

eg this should actually return self.copy()

not to mention this is not the correct place for this

This reverts commit 0abfc7e.

Co-authored-by: sanderland <48946947+sanderland@users.noreply.github.com>

jorisvandenbossche · 2020-08-18T07:59:16Z

I think Simon's call to merge this was appropriate (all the existing review comments were addressed, it was reviewed by several core devs). Yes, you noticed an additional problem (the missing copy) after merge: no problem, that can always happen, and we can do a follow-up to fix this.

not to mention this is not the correct place for this

See my comment at #35543 (comment) (and @sanderland's own answer above that), your suggestion to move it down into the internals is not possible, so this function is a correct place to put the check.

Update generic.py

2b17cb2

sanderland marked this pull request as ready for review August 4, 2020 13:34

WillAyd added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Aug 4, 2020

add test

1676a43

jreback requested changes Aug 6, 2020

View reviewed changes

jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Aug 6, 2020

sanderland mentioned this pull request Aug 7, 2020

BUG: interpolate gives a TypeError on empty dataframes #35598

Closed

3 tasks

Sander Land and others added 2 commits August 7, 2020 10:44

whatsnew entry, test fix

5844c89

Merge branch 'master' into patch-1

cf917b6

sanderland requested a review from jreback August 7, 2020 08:46

simonjayhawkins added this to the 1.1.1 milestone Aug 7, 2020

simonjayhawkins reviewed Aug 7, 2020

View reviewed changes

doc/source/whatsnew/v1.1.1.rst Outdated Show resolved Hide resolved

sanderland and others added 3 commits August 7, 2020 21:36

Update pandas/tests/frame/methods/test_interpolate.py

6b5245b

Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>

Update doc/source/whatsnew/v1.1.1.rst

ad4eaa7

Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>

Merge branch 'master' into patch-1

a44a4c4

jorisvandenbossche changed the title ~~Fix interpolation on empty dataframe~~ REGR: Fix interpolation on empty dataframe Aug 13, 2020

jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Aug 13, 2020

jorisvandenbossche approved these changes Aug 17, 2020

View reviewed changes

simonjayhawkins merged commit 0abfc7e into pandas-dev:master Aug 17, 2020

meeseeksmachine mentioned this pull request Aug 17, 2020

Backport PR #35543 on branch 1.1.x (REGR: Fix interpolation on empty dataframe) #35764

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Aug 17, 2020

Backport PR pandas-dev#35543: REGR: Fix interpolation on empty dataframe

b67bca4

jreback reviewed Aug 17, 2020

View reviewed changes

simonjayhawkins added a commit that referenced this pull request Aug 17, 2020

Revert "REGR: Fix interpolation on empty dataframe (#35543)"

4134cd7

This reverts commit 0abfc7e.

This was referenced Aug 17, 2020

Revert "REGR: Fix interpolation on empty dataframe" #35766

Closed

REGR: follow-up to return copy with df.interpolate on empty DataFrame #35774

Merged

simonjayhawkins pushed a commit that referenced this pull request Aug 17, 2020

Backport PR #35543: REGR: Fix interpolation on empty dataframe (#35764)

7a5d186

Co-authored-by: sanderland <48946947+sanderland@users.noreply.github.com>

sanderland deleted the patch-1 branch August 24, 2020 14:08

simonjayhawkins mentioned this pull request Oct 1, 2020

Backport PR #36706 on branch 1.1.x (CI: npdev new exception message) #36751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: Fix interpolation on empty dataframe #35543

REGR: Fix interpolation on empty dataframe #35543

sanderland commented Aug 4, 2020 •

edited

Loading

pep8speaks commented Aug 4, 2020 •

edited

Loading

WillAyd commented Aug 4, 2020

sanderland commented Aug 5, 2020

jreback left a comment

jreback Aug 6, 2020

sanderland Aug 7, 2020

simonjayhawkins Aug 7, 2020

simonjayhawkins Aug 7, 2020

simonjayhawkins Aug 7, 2020

jorisvandenbossche Aug 13, 2020

jreback Aug 14, 2020

sanderland Aug 17, 2020 •

edited

Loading

jorisvandenbossche Aug 17, 2020

jorisvandenbossche commented Aug 17, 2020

simonjayhawkins commented Aug 17, 2020

simonjayhawkins commented Aug 17, 2020

simonjayhawkins commented Aug 17, 2020

jreback left a comment

jorisvandenbossche commented Aug 18, 2020

		@@ -6799,6 +6799,9 @@ def interpolate(

		obj = self.T if should_transpose else self

		if obj.empty:

REGR: Fix interpolation on empty dataframe #35543

REGR: Fix interpolation on empty dataframe #35543

Conversation

sanderland commented Aug 4, 2020 • edited Loading

pep8speaks commented Aug 4, 2020 • edited Loading

Comment last updated at 2020-08-07 19:37:15 UTC

WillAyd commented Aug 4, 2020

sanderland commented Aug 5, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanderland Aug 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Aug 17, 2020

simonjayhawkins commented Aug 17, 2020

simonjayhawkins commented Aug 17, 2020

simonjayhawkins commented Aug 17, 2020

jreback left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Aug 18, 2020

sanderland commented Aug 4, 2020 •

edited

Loading

pep8speaks commented Aug 4, 2020 •

edited

Loading

sanderland Aug 17, 2020 •

edited

Loading