DEPR: is_copy #18801

jreback · 2017-12-15T23:24:28Z

this has always been an internal attribute. We can simply replace by ._is_copy and provide a deprecation warning on the property.

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2017-12-18T10:20:17Z

is_copy is used in the most popular google hits about SettingWithCopyWarning, eg https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas and https://www.dataquest.io/blog/settingwithcopywarning/
(actually in the first you are using it yourself in an answer :-))

What is the alternative for power users in library code if they want to avoid an extra unnecessary copy?

(I never use it myself, as I just do copy() to avoid if needed, so I am personally not attached to this method, but typically don't handle with data where doing an extra copy() is a problem. But I think the use case mentioned by Andreas in #18799 is a valid one)

- Renamed 'is_copy' attribute to '_is_copy' for internal use - Setup getter and setter for 'is_copy' - Added tests for deprecation warning

amueller · 2017-12-21T20:24:09Z

So the suggested solution is to use copy() and that introduces an additional copy? Can you please document that? It seems a bit counter-intuitive that a library needs to do an expensive operation to avoid a warning in user-space, but if that's the suggested (and documented) fix I'll live with it.

jreback · 2017-12-21T20:29:30Z

@amueller not sure there is anything to add to: http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy

you are chain indexing, which violates view semantics. the point is it may work, but there are cases where it won't. without copy-on-write, you must copy.

amueller · 2017-12-21T20:34:15Z

Ok maybe I just really don't understand the documentation, which is entirely possible. My reading of the warning is that we are returning a copy here, which is the intent. Are you saying it might sometimes return a view instead?

I don't want to use view semantics, and it tells me I got a copy. I'm very happy I got a copy, it's what I wanted. If I got a view instead, I would need to copy. But I thought the warning said I got a copy, not a view.

jreback · 2017-12-21T20:37:56Z

exactly, you have understood the point. you don't now whether it is a copy or a view on the original. That is the problem. you are doing chained operations and we can't be sure, so you get the warning. it is up to you to: 1) not chain operations, 2) defensively copy.

amueller · 2017-12-21T20:38:25Z

So would the warning also be thrown if it is a view?

amueller · 2017-12-21T20:40:13Z

If it's also thrown if it's a view, then the warning is misleading, it says "A value is trying to be set on a copy of a slice from a DataFrame". If it's not thrown on a view, then it seems like I can distinguish between view and copy, and then I should only copy if I got a view.

jreback · 2017-12-21T20:40:56Z

no, if you only have a single dtyped dataframe you won't get this. it only occurs when you filter then add a column on multiple dtypes.

amueller · 2017-12-21T20:49:04Z

The question is: can I not find out at runtime if I got a copy or a view and only copy if I got a view?

jreback · 2017-12-21T20:52:09Z

you can try by introspecting the underlying arrays (not .values)
if anything is a view you must copy

amueller · 2017-12-21T20:54:08Z

ok. Does that mean that the warning might have been raised even though there is memory sharing?

amueller · 2017-12-21T20:56:15Z

Sorry if that question was answered by

no, if you only have a single dtyped dataframe you won't get this. it only occurs when you filter then add a column on multiple dtypes.

but I don't know how that relates to what happens to the memory. I assume it was meant as a reply to #18801 (comment) but I don't understand how it relates to it.

jreback · 2017-12-21T21:12:32Z

because someone could have chained indexed and we don’t know if views are created we
cant be sure that you are not actually looking at a view of something else
and more insidious is that you may have some columns with a view and some without
so we carry around this _is_copy flag which is a weakref to something that is referant
when a copy is made we can clear this
but until then some operations may not know if it’s a view or a copy
now it doesn’t matter until you actually try to assign something to a frame
when this happens

it’s jt trivial and mostly edge cases but if you are seeing the warning then you have incorrect code
it may still work but it IS chained indexing

use at your own risk - you should copy after filtering

amueller · 2017-12-21T21:19:37Z

Alright. I feel the warning is pretty confusing since it seems to imply that we made a copy, but it only implies that there is some part of the dataframe that was copied, and we don't actually know whether we made a copy or not.

use at your own risk - you should copy after filtering

Maybe the section in the docs that discusses this warning should say that? I don't think it says that now.

jorisvandenbossche · 2017-12-22T10:03:20Z

To repeat myself from the issue: I think @amueller use case is valid one that we should try to support. If not through is_copy, then in another way.
(btw, @jreback it would be nice to at least answer to my objection on the issue here before merging)

In case of sklearn's train_test_split, they are using integer positional indexing, which will (as far as I understand fancy indexing in numpy) never return a view, not even in case of DataFrames with single dtypes. So they can be sure that their subset of a frame is a copy (which they want) and a SettingWithCopyWarning should never be raised on the frames returned by that function.

@amueller not sure there is anything to add to: http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy

Explicitly taking a copy is not mentioned in those docs, so could certainly be added.

has2k1 · 2017-12-31T23:49:21Z

Although is_copy was meant to be used internally, it leaked out because it was a solution to a justifiable problem i.e you slice a dataframe, you know you created independent dataframe and you want no complaints. A copy() operation on the new dataframe is wasteful, especially in library code.

plotnine uses is_copy in about 20 locations, and almost every call from the user will hit is_copy at least 10 times, and the number goes up linearly depending on different factors. The example on the documentation page goes through about 80 of them.

jorisvandenbossche · 2018-01-22T08:31:15Z

@jreback Can you answer to the objections of me and others? (edit: see now there is a little bit more discussion in #19102)

jreback · 2018-01-22T11:00:06Z

Until copy-on-write, his is simply not possible in pandas in a reliable way. We don't have full control over memory allocations or when views are actually made.

jorisvandenbossche · 2018-01-22T17:12:45Z

@has2k1 I see for plotnine you switched to using a contextmanager (with pd.option_context('mode.chained_assignment', None):) around the plotting code (has2k1/plotnine@9b068b4).
This is a satisfying solution for you ?

has2k1 · 2018-01-22T18:00:29Z

It is an okay stopgap measure until copy-on-write is available, but as it implicitly assumes user cognisance it is not a good long term solution. Also, since the package aims to be extensible in many ways, the effects of a context manager may extend to other packages.

On the other-hand 'is_copy' was explicit, it forced the user to acknowledge the potential problem at every instance and I think it was better in an open source environment.

znd4 · 2018-05-23T19:04:48Z

I have a different reason to want this:

I'm working on a data pipeline with large enough datasets that I'm worried about the performance hit from repeated copies. An easy way to try to control that would be something like assert transformed_dataframe.is_copy == False at the end of each unit test.

sam-cohan · 2018-05-29T01:41:18Z

Yet another feasible use case can be when trying to do multi-processing where portions of a DataFrame are processed in different processes. I was under the assumption that if I take a view, when a process is spawned, only the view will be copied over taking 2X memory. In contrast, if I make a copy, then essentially the original process now has two full copies and each process will also have the partial copy so we will end up with 3X memory requirement...

jreback added the Deprecate Functionality to remove in pandas label Dec 15, 2017

jreback added this to the 0.22.0 milestone Dec 15, 2017

jreback mentioned this issue Dec 15, 2017

document is_copy #18799

Closed

jreback added Difficulty Intermediate labels Dec 15, 2017

cbertinato pushed a commit to cbertinato/pandas that referenced this issue Dec 17, 2017

DEPR: Deprecate is_copy (pandas-dev#18801)

acf7abd

cbertinato mentioned this issue Dec 17, 2017

DEPR: Deprecate is_copy (#18801) #18812

Merged

jsexauer mentioned this issue Dec 18, 2017

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

cbertinato pushed a commit to cbertinato/pandas that referenced this issue Dec 18, 2017

DEPR: Deprecate is_copy (pandas-dev#18801)

9b8f2af

- Renamed 'is_copy' attribute to '_is_copy' for internal use - Setup getter and setter for 'is_copy' - Added tests for deprecation warning

amueller mentioned this issue Dec 18, 2017

train_test_split on Pandas Dataframe can lead to SetttingWithCopyWarning scikit-learn/scikit-learn#8723

Closed

jreback closed this as completed in #18812 Dec 21, 2017

jreback pushed a commit that referenced this issue Dec 21, 2017

DEPR: Deprecate is_copy (#18801) (#18812)

6d2fb3e

cbertinato mentioned this issue Dec 23, 2017

DEPR: Added is_copy to NDFrame._deprecations #18922

Merged

has2k1 mentioned this issue Jan 6, 2018

ENH: A cheap copy on a dataframe #19102

Closed

TomAugspurger mentioned this issue Dec 28, 2018

[MRG] Pandas Interoperability section scikit-learn/scikit-learn#11305

Closed

jorisvandenbossche mentioned this issue Jan 24, 2020

Raising SettingWithCopyWarning in cross_validate scikit-learn/scikit-learn#16191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: is_copy #18801

DEPR: is_copy #18801

jreback commented Dec 15, 2017

jorisvandenbossche commented Dec 18, 2017 •

edited

Loading

amueller commented Dec 21, 2017

jreback commented Dec 21, 2017

amueller commented Dec 21, 2017

jreback commented Dec 21, 2017

amueller commented Dec 21, 2017

amueller commented Dec 21, 2017 •

edited

Loading

jreback commented Dec 21, 2017

amueller commented Dec 21, 2017

jreback commented Dec 21, 2017

amueller commented Dec 21, 2017

amueller commented Dec 21, 2017

jreback commented Dec 21, 2017

amueller commented Dec 21, 2017

jorisvandenbossche commented Dec 22, 2017

has2k1 commented Dec 31, 2017

jorisvandenbossche commented Jan 22, 2018 •

edited

Loading

jreback commented Jan 22, 2018

jorisvandenbossche commented Jan 22, 2018

has2k1 commented Jan 22, 2018

znd4 commented May 23, 2018

sam-cohan commented May 29, 2018

DEPR: is_copy #18801

DEPR: is_copy #18801

Comments

jreback commented Dec 15, 2017

jorisvandenbossche commented Dec 18, 2017 • edited Loading

amueller commented Dec 21, 2017

jreback commented Dec 21, 2017

amueller commented Dec 21, 2017

jreback commented Dec 21, 2017

amueller commented Dec 21, 2017

amueller commented Dec 21, 2017 • edited Loading

jreback commented Dec 21, 2017

amueller commented Dec 21, 2017

jreback commented Dec 21, 2017

amueller commented Dec 21, 2017

amueller commented Dec 21, 2017

jreback commented Dec 21, 2017

amueller commented Dec 21, 2017

jorisvandenbossche commented Dec 22, 2017

has2k1 commented Dec 31, 2017

jorisvandenbossche commented Jan 22, 2018 • edited Loading

jreback commented Jan 22, 2018

jorisvandenbossche commented Jan 22, 2018

has2k1 commented Jan 22, 2018

znd4 commented May 23, 2018

sam-cohan commented May 29, 2018

jorisvandenbossche commented Dec 18, 2017 •

edited

Loading

amueller commented Dec 21, 2017 •

edited

Loading

jorisvandenbossche commented Jan 22, 2018 •

edited

Loading