DEPR: Deprecate convert parameter in take #17352

gfyoung · 2017-08-27T20:20:35Z

The parameter is not respected, nor is it a parameter in many 'take' implementations.

codecov · 2017-08-28T04:07:42Z

Codecov Report

Merging #17352 into master will decrease coverage by 0.01%.
The diff coverage is 86.66%.

@@            Coverage Diff             @@
##           master   #17352      +/-   ##
==========================================
- Coverage   91.03%   91.02%   -0.02%     
==========================================
  Files         162      162              
  Lines       49567    49577      +10     
==========================================
+ Hits        45125    45126       +1     
- Misses       4442     4451       +9

Flag	Coverage Δ
#multiple	`88.8% <86.66%> (ø)`	⬆️
#single	`40.23% <23.33%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.21% <100%> (ø)`	⬆️
pandas/core/sparse/series.py	`95.13% <100%> (+0.04%)`	⬆️
pandas/core/series.py	`95.01% <100%> (ø)`	⬆️
pandas/core/indexing.py	`93.94% <75%> (ø)`	⬆️
pandas/core/frame.py	`97.72% <75%> (-0.1%)`	⬇️
pandas/core/generic.py	`92% <87.5%> (+0.01%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 473a7f3...4471360. Read the comment docs.

codecov · 2017-08-28T04:07:48Z

Codecov Report

Merging #17352 into master will decrease coverage by 0.01%.
The diff coverage is 92.45%.

@@            Coverage Diff             @@
##           master   #17352      +/-   ##
==========================================
- Coverage   91.27%   91.25%   -0.02%     
==========================================
  Files         163      163              
  Lines       49766    49785      +19     
==========================================
+ Hits        45422    45433      +11     
- Misses       4344     4352       +8

Flag	Coverage Δ
#multiple	`89.05% <92.45%> (ø)`	⬆️
#single	`40.34% <60.37%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/sparse/series.py	`95.15% <100%> (+0.05%)`	⬆️
pandas/core/series.py	`95.04% <100%> (+0.1%)`	⬆️
pandas/core/groupby.py	`92.23% <100%> (ø)`	⬆️
pandas/core/frame.py	`97.73% <75%> (-0.1%)`	⬇️
pandas/core/indexing.py	`93.94% <75%> (ø)`	⬆️
pandas/core/generic.py	`92.12% <96%> (+0.04%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 030e374...9325f21. Read the comment docs.

gfyoung · 2017-08-28T04:11:04Z

pandas/core/groupby.py

-        return self.data.take(self.sort_idx, axis=self.axis, convert=False)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            return self.data.take(self.sort_idx, axis=self.axis, convert=False)


@jreback @jorisvandenbossche : This is a tricky one for me here. On the one hand, I want to remove convert=False, but because axis of the groupby object might be invalid when calling Series.take, this is not possible for me to do (when True, it re-indexes the value according to the length of the slice along the index).

On the other hand, this catch_warnings(), while an easy way to get things to pass for now, is not a solution I want to merge. Suggestions on how to handle this problem (or is it best that we just expose convert for all relevant implementations, though I'm not a fan).

Actually, might have found the answer myself 😄

chris-b1 · 2017-08-28T15:16:31Z

pandas/core/frame.py

@@ -3354,7 +3354,7 @@ def dropna(self, axis=0, how='any', thresh=None, subset=None,
                else:
                    raise TypeError('must specify how or thresh')

-            result = self.take(mask.nonzero()[0], axis=axis, convert=False)


I don't think these internal uses should be removed. We're guaranteed the indexer is inbounds, so it's an unnecessary performance hit to check it.

What do you mean by this? DataFrame.take doesn't even respect the convert parameter. Thus, this will have no effect on anything. 😄

Ok, this example doesn't matter b/c it's on a frame, but does with the usages of _data.take

And Series.take

I don't think there's a bounds check anymore in Series.take or _data.take because I removed all of them for that same reason. We have so many bounds checks all over the place that removing just one of them is okay now.

chris-b1 · 2017-08-28T15:30:57Z

pandas/core/series.py

        if kwargs:
            nv.validate_take(tuple(), kwargs)

-        # check/convert indicies here
-        if convert:
-            indices = maybe_convert_indices(indices, len(self._get_axis(axis)))


Not sure this is safe to remove - with non numpy types, we call to an internal take impl that doesn't bounds check. Can you try pd.Series([1,2,3], dtype='category').take([-1, -2]) on your branch?

Okay, so that breaks:

2 NaN 1 1.0 dtype: category Categories (3, int64): [1, 2, 3]

That being said, that call works without dtype='category', meaning we do handle negative indices correctly in some cases already. I might as well add that handling for category too.

jreback · 2017-08-29T12:56:46Z

So I originally added this because I wanted the ability to a) convert neg to positive, and b) bounds check; these are not the default because of perf (IOW internally we often know that we don't need either of these things, but from a public perspetive you do need this).

These shouldn't have been on the public NDFrame though (so good to remove). So the question is, do we need a ._take() to do these things? (and .take() would just be a wrapper)?

chris-b1 · 2017-08-29T13:13:37Z

I'd be in favor of a _take with options to be unsafe. It's not like boundschecking is that expensive, but it's such a core operation I hate to give up any performance.

gfyoung · 2017-08-29T15:01:53Z

I suppose that in the name of performance, that's fair. I think I would just need to implement this solely for DataFrame and Series right?

chris-b1 · 2017-08-29T19:56:22Z

Yes, that's correct. BlockManager and SingleBlockManager will need to keep their take implementation, but I would keep it unprefixed since they are already internal.

gfyoung · 2017-08-29T20:37:23Z

@chris-b1 : Sorry, somehow your answer only confused me more. Yes or no, do I need to implement _take for just Series and DataFrame ?

chris-b1 · 2017-08-29T21:30:16Z

yes

…

On Tue, Aug 29, 2017 at 3:37 PM, gfyoung ***@***.***> wrote: @chris-b1 <https://github.com/chris-b1> : Sorry, somehow your answer only confused me more. Yes or no, do I need to implement _take for just Series and DataFrame ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17352 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB1b_Duk4u1gGVVH0bjyBu6xvA6OgmHvks5sdHaUgaJpZM4PD6BQ> .

gfyoung · 2017-08-30T05:49:36Z

@chris-b1 : Alright, here goes. Let's see what CI has to say about my implementation.

gfyoung · 2017-08-30T07:58:08Z

@chris-b1 @jreback : Successfully implemented _take with all green. PTAL.

jreback · 2017-08-30T09:58:20Z

pandas/core/frame.py

@@ -2033,7 +2033,7 @@ def _ixs(self, i, axis=0):
                return self.loc[:, lab_slice]
            else:
                if isinstance(label, Index):
-                    return self.take(i, axis=1, convert=True)
+                    return self.take(i, axis=1)


every reference to an internal take should be to ._take (this is pretty much all of them)

Is this really needed?
Can't we keep take and only use _take where it is needed? IMO it just makes it more complex to understand the code.

Agree with @jorisvandenbossche . I'm not sure I really understand what you mean by "reference to an internal take" since the original references are to a public function.

you have now 2 functions to do the same thing , a public and private version

it is much more clear to simply use _take internally in all cases

the public function is just an interface

otherwise a future reader will not know which to use - recipe for confusion

pls change all uses to _take

There will be confusion anyhow, because then people will wonder what the difference is with pubkic take, and in majority of the cases, there is no difference. IMO that is more clear by using the public function.

There is AFAIK only one class where negative indexing isn't handled properly, and that is Categorical. Makes we wonder whether we should revisit the option of adding supporting given this disagreement about design though (@chris-b1 thoughts?)

Let me boil it down to this (in reference to my previous comment): besides potential test failures, is there reason why we call take_1d in pandas.core.categorical instead of just calling self._codes.take(...) ?

pls change all references to _take

.take is ONLY a wrapper function and should not be used anywhere in the code base
this eliminates any possibility of confusion and is inline with the existing style
mixing usages is cause for trouble

your other references are simply not relevant here and are again purely internal uses (take_1d)
if u want to bring them up in another issue by all means

let's keep the scope to changing refs to the NDFrame .take changes

If there were multiple places where I had to use ._take(), then I think I would be more inclined to agree with you @jreback . However, at this point, there is only one place where this had to happen, and if necessary, I can easily "clear up" any confusion by adding a comment.

IMO one usage isn't convincing enough for me to convert all of other invocations (both in NDFrame and by any other pandas class to use ._take()). I think I still have to side with @jorisvandenbossche on this one.

@gfyoung if this were a new public function, and you wrote a private impl _take as well. Then you certainly would use the private impl everywhere. I don't see a difference here. Pls make the change.

jreback · 2017-08-30T09:59:09Z

pandas/core/generic.py

@@ -2058,6 +2058,25 @@ def __delitem__(self, key):
        except KeyError:
            pass

+    def _take(self, indices, axis=0, convert=True, is_copy=False):
+        """
+        The internal version of `self.take` that will house the `convert`


more descriptive doc-string with parameters and such

jreback · 2017-08-30T09:59:48Z

pandas/core/generic.py

+                result._set_is_copy(self)
+
+        return result
+
    def take(self, indices, axis=0, convert=True, is_copy=True, **kwargs):


also deprecate is_copy; that is another internal parameter; its only necessary for _take

That's a fair point. If you don't mind, I'll do this in a subsequent PR once the infrastructure for _take has been merged in.

jreback · 2017-08-30T10:00:35Z

pandas/core/series.py

-
-        See also
-        --------
-        numpy.ndarray.take


why do we have this? now that it is generic is shouldn't be needed

Have what exactly? You commented on a not-so clear part of the diff.

jreback · 2017-08-30T10:00:52Z

pandas/core/sparse/series.py

@@ -604,14 +604,38 @@ def sparse_reindex(self, new_index):

    def take(self, indices, axis=0, convert=True, *args, **kwargs):
        """
-        Sparse-compatible version of ndarray.take


use a shared doc string for all public .take()

chris-b1 · 2017-08-30T10:38:06Z

pandas/core/generic.py

+
+        self._consolidate_inplace()
+        new_data = self._data.take(indices,
+                                   axis=self._get_block_manager_axis(axis),


pass through convert

jreback · 2017-09-23T16:47:18Z

so this still needs to change virtually all uses of .take() -> ._take(). We want to limit the exposure to user-facing .take() in any code that is not explicitly user facing.

jreback · 2017-09-23T18:54:42Z

pandas/core/series.py

        if convert:
            indices = maybe_convert_indices(indices, len(self._get_axis(axis)))

        indices = _ensure_platform_int(indices)
        new_index = self.index.take(indices)
        new_values = self._values.take(indices)
-        return (self._constructor(new_values, index=new_index, fastpath=True)
-                    .__finalize__(self))


any reason you can't just call the super method here?

For ._constructor ? I was trying to preserve existing (working) code as much as possible when making these changes.

no the entire routine
give it s try

https://travis-ci.org/pandas-dev/pandas/builds/279118227

I gave it a try, and there were a bunch of failures.

ok, not really sure why it shoulnt' work. its almost the same code

gfyoung · 2017-09-23T21:29:18Z

https://travis-ci.org/pandas-dev/pandas/jobs/279023089

@jreback : Not sure why doing this conversion to _.take() is causing test failures.

xref pandas-devgh-16948. The parameter is not respected, nor is it a parameter in many 'take' implementations.

gfyoung · 2017-10-01T06:53:35Z

@jreback : I've done all of the converting (without breaking tests). PTAL.

jreback · 2017-10-01T20:05:06Z

thanks @gfyoung !

* 'master' of github.com:pandas-dev/pandas: (188 commits) Separate out _convert_datetime_to_tsobject (pandas-dev#17715) DOC: remove whatsnew note for xref pandas-dev#17131 BUG: Regression in .loc accepting a boolean Index as an indexer (pandas-dev#17738) DEPR: Deprecate cdate_range and merge into bdate_range (pandas-dev#17691) CLN: replace %s syntax with .format in pandas.core: categorical, common, config, config_init (pandas-dev#17735) Fixed the memory usage explanation of categorical in gotchas from O(nm) to O(n+m) (pandas-dev#17736) TST: add backward compat for offset testing for pickles (pandas-dev#17733) remove unused time conversion funcs (pandas-dev#17711) DEPR: Deprecate convert parameter in take (pandas-dev#17352) BUG:Time Grouper bug fix when applied for list groupers (pandas-dev#17587) BUG: Fix some PeriodIndex resampling issues (pandas-dev#16153) BUG: Fix unexpected sort in groupby (pandas-dev#17621) DOC: Fixed typo in documentation for 'pandas.DataFrame.replace' (pandas-dev#17731) BUG: Fix series rename called with str altering name rather index (GH17407) (pandas-dev#17654) DOC: Add examples for MultiIndex.get_locs + cleanups (pandas-dev#17675) Doc improvements for IntervalIndex and Interval (pandas-dev#17714) BUG: DataFrame sort_values and multiple "by" columns fails to order NaT correctly Last of the timezones funcs (pandas-dev#17669) Add missing file to _pyxfiles, delete commented-out (pandas-dev#17712) update imports of DateParseError, remove unused imports from tslib (pandas-dev#17713) ...

jreback · 2017-10-03T08:08:01Z

pandas/tests/sparse/test_series.py::TestSparseSeries::()::test_numpy_take
/home/travis/miniconda3/envs/pandas/lib/python3.5/site-packages/numpy/core/fromnumeric.py:123: FutureWarning: The 'convert' parameter is deprecated and will be removed in a future version.
return take(indices, axis, out, mode)

coming up on the 3.5 build of master

seems we r not suppressing somewhere (or prob needs to change to use _take maybe)

can u take care of @gfyoung

gfyoung · 2017-10-03T09:02:35Z

@jreback : The cause of the warning is that an older version of numpy passes in None to convert by default when calling np.take (see here).

Thus, this is a compatibility issue, as it disappears in later versions of numpy (I actually worked on fixing fromnumeric compatibility issues at some point).

I'll put a check in the test to expect that warning for older versions of numpy, as checking if not convert is pretty idiomatic, and I wouldn't want to change it just because an older version of numpy is issuing a warning when we call .take() in a non-recommended way.

Follow-up to pandas-devgh-17352.

Follow-up to gh-17352.

Follow-up to pandas-devgh-17352.

xref pandas-devgh-16948. The parameter is not respected, nor is it a parameter in many 'take' implementations.

Follow-up to pandas-devgh-17352.

xref pandas-devgh-16948. The parameter is not respected, nor is it a parameter in many 'take' implementations.

Follow-up to pandas-devgh-17352.

gfyoung added the Deprecate Functionality to remove in pandas label Aug 27, 2017

gfyoung force-pushed the take-convert-depr branch 2 times, most recently from 9e0bcad to 4471360 Compare August 28, 2017 04:07

gfyoung force-pushed the take-convert-depr branch from 4471360 to 3a1de27 Compare August 28, 2017 04:09

gfyoung commented Aug 28, 2017

View reviewed changes

gfyoung force-pushed the take-convert-depr branch 2 times, most recently from 78f0c4e to bfa4bf6 Compare August 28, 2017 05:02

chris-b1 reviewed Aug 28, 2017

View reviewed changes

jsexauer mentioned this pull request Aug 29, 2017

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

jreback requested changes Aug 30, 2017

View reviewed changes

chris-b1 reviewed Aug 30, 2017

View reviewed changes

gfyoung force-pushed the take-convert-depr branch 2 times, most recently from e0b209b to 35812d0 Compare September 23, 2017 10:08

gfyoung force-pushed the take-convert-depr branch from 47e1879 to f30634f Compare September 23, 2017 18:46

jreback reviewed Sep 23, 2017

View reviewed changes

gfyoung force-pushed the take-convert-depr branch from f30634f to e9c9bd9 Compare September 23, 2017 19:31

gfyoung force-pushed the take-convert-depr branch from e9c9bd9 to 41504af Compare September 23, 2017 23:49

gfyoung force-pushed the take-convert-depr branch 9 times, most recently from 889fb83 to ec299bc Compare October 1, 2017 05:05

DEPR: Deprecate convert parameter in take

9325f21

xref pandas-devgh-16948. The parameter is not respected, nor is it a parameter in many 'take' implementations.

gfyoung force-pushed the take-convert-depr branch from ec299bc to 9325f21 Compare October 1, 2017 06:13

jreback added this to the 0.21.0 milestone Oct 1, 2017

jreback approved these changes Oct 1, 2017

View reviewed changes

jreback merged commit 458c1dc into pandas-dev:master Oct 1, 2017

gfyoung deleted the take-convert-depr branch October 1, 2017 20:32

gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 3, 2017

COMPAT: Suppress .take() warning for numpy < 1.12

3a4102c

Follow-up to pandas-devgh-17352.

gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 3, 2017

COMPAT: Suppress .take() warning for numpy < 1.12

abc06e3

Follow-up to pandas-devgh-17352.

gfyoung mentioned this pull request Oct 3, 2017

COMPAT: Suppress .take() warning for numpy < 1.12 #17764

Merged

jreback pushed a commit that referenced this pull request Oct 3, 2017

COMPAT: Suppress .take() warning for numpy < 1.12 (#17764)

9ac7c51

Follow-up to gh-17352.

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017

COMPAT: Suppress .take() warning for numpy < 1.12 (pandas-dev#17764)

ef52915

Follow-up to pandas-devgh-17352.

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

DEPR: Deprecate convert parameter in take (pandas-dev#17352)

0f12a7f

xref pandas-devgh-16948. The parameter is not respected, nor is it a parameter in many 'take' implementations.

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

COMPAT: Suppress .take() warning for numpy < 1.12 (pandas-dev#17764)

fbef804

Follow-up to pandas-devgh-17352.

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

DEPR: Deprecate convert parameter in take (pandas-dev#17352)

5a998a8

xref pandas-devgh-16948. The parameter is not respected, nor is it a parameter in many 'take' implementations.

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

COMPAT: Suppress .take() warning for numpy < 1.12 (pandas-dev#17764)

c4ef354

Follow-up to pandas-devgh-17352.

jreback mentioned this pull request Jul 1, 2019

DEPR: deprecations log for removed issues #13777

Closed

DEPR: Deprecate convert parameter in take #17352

DEPR: Deprecate convert parameter in take #17352

Conversation

gfyoung commented Aug 27, 2017

codecov bot commented Aug 28, 2017

Codecov Report

codecov bot commented Aug 28, 2017 • edited Loading

Codecov Report

gfyoung Aug 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 29, 2017 • edited Loading

chris-b1 commented Aug 29, 2017

gfyoung commented Aug 29, 2017

chris-b1 commented Aug 29, 2017

gfyoung commented Aug 29, 2017

chris-b1 commented Aug 29, 2017 via email

gfyoung commented Aug 30, 2017

gfyoung commented Aug 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Sep 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Sep 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Aug 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Sep 23, 2017

gfyoung commented Oct 1, 2017

jreback commented Oct 1, 2017

jreback commented Oct 3, 2017

gfyoung commented Oct 3, 2017 • edited Loading

codecov bot commented Aug 28, 2017 •

edited

Loading

gfyoung Aug 28, 2017 •

edited

Loading

jreback commented Aug 29, 2017 •

edited

Loading

gfyoung Sep 3, 2017 •

edited

Loading

gfyoung Sep 5, 2017 •

edited

Loading

gfyoung Aug 30, 2017 •

edited

Loading

gfyoung commented Oct 3, 2017 •

edited

Loading