document is_copy #18799

amueller · 2017-12-15T21:34:51Z

Code Sample, a copy-pastable example if possible

Problem description

I find the behavior of SettingWithCopyWarning quite surprising, but I guess that's just what it is.
It would be great if you could document is_copy and how to use it, though.

Whenever any function returns a dataframe, it seems like it should make sure that is_copy is set to False (or None?) so the user doesn't get a warning if they change it - if you're returning a dataframe, it's unlikely that the user expects this to be a view, and you're not doing chained assignments.

The is_copy attribute has an empty docstring in the docs and I couldn't find any explanation of it on the website (via google). The only think that told me that overwriting this attribute is actually the right thing to do (again, which is pretty weird to me), was #6025 (comment)

The text was updated successfully, but these errors were encountered:

jreback · 2017-12-15T23:24:38Z

you should certainly not be using this. This was always supposed to be an internal attribute, I am going to deprecate it.

you can avoid very easily by just doing a copy on filtered results. or using assign rather than indexing.

e.g.

df = df[mask].assign(foo=....)

is the pattern

or you can just turn the warning completely off in a context manager.

closing in favor of a deprecation issue #18801

amueller · 2017-12-16T00:24:32Z

Thank you for your reply, but I don't follow. The result is already a copy, right? Why do I need to copy it again? I can't turn it off in a context manager since I hand the result to the user, and the warning is raised in the user code. Sent from phone. Please excuse spelling and brevity.

…

On Dec 15, 2017 18:24, "Jeff Reback" ***@***.***> wrote: Closed #18799 <#18799>. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#18799 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAbcFsr0hjbhy0c90pEThZI4xdpvGcYoks5tAv_IgaJpZM4REBuC> .

amueller · 2017-12-16T00:26:30Z

I can also not assign, as this is again not in my control. I'm doing a masking operation, and I want to return the masked df over to the user.

jorisvandenbossche · 2017-12-18T10:21:12Z

@amueller Could you give a small illustrative example?

Discussion about deprecation itself is in #18801

jreback · 2017-12-18T12:01:12Z

Canoncially, this is very easy to work with, simply .copy() after a filter assignment. Agreed that this is not the most intuitive things, but there are many edge cases; copy-on-write fixes this but won't be available in pandas1.

In [1]: df = pd.DataFrame({'A':[1,2,3]})

In [2]: df2 = df[df.A>2]

In [3]: df2['B'] = 2
/Users/jreback/miniconda3/envs/pandas/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/Users/jreback/miniconda3/envs/pandas/bin/python

In [4]: df2
Out[4]: 
   A  B
2  3  2

use .copy() (or .assign())

In [1]: df = pd.DataFrame({'A':[1,2,3]})

In [2]: df2 = df[df.A>2].copy()

In [3]: df2['B'] = 2

amueller · 2017-12-18T16:03:12Z

This was about train_test_split in sklearn, see scikit-learn/scikit-learn#8723

But basically the pattern is:

# library code:

def discard_less_than_zero(df):
    return df[df.A >= 0] 

# user code
df = pd.DataFrame({'A':[1,2,3]})
df2 = discard_less_than_zero(df)
df2['B'] = 2

This is of course a contrived example, but I think the same applies whenever you have a library method that returns a sliced dataframe. If copy is the canonical solution, that's fine.

# library code:

def discard_less_than_zero(df):
    return df[df.A >= 0].copy()

should do it. It just seems conceptually odd. If I understand the warning correctly, this means df[df.A >= 0] is copied twice, right? It warns me that df[df.A >= 0] is a copy, and to get rid of that warning I copy it. (unless .copy() doesn't actually copy?).

If df is on the order of magnitude of the free memory, doing an additional copy can mean not being able to work on certain datasets.

And regarding the deprecation, I'm not married to any method. I just want a canonical way to solve the issue I described above, ideally without making unnecessary copies. I phrased the issue the way I did because the only information I could find was #6025 (comment), in which @jreback suggests using is_copy, so I thought this was the canonical way of doing this.

jreback closed this as completed Dec 15, 2017

jreback added Compat pandas objects compatability with Numpy or Python functions Usage Question labels Dec 15, 2017

jreback added this to the Next Major Release milestone Dec 15, 2017

jorisvandenbossche mentioned this issue Dec 18, 2017

DEPR: is_copy #18801

Closed

amueller mentioned this issue Dec 18, 2017

train_test_split on Pandas Dataframe can lead to SetttingWithCopyWarning scikit-learn/scikit-learn#8723

Closed

jorisvandenbossche mentioned this issue Jan 12, 2018

pandas.DataFrame.is_copy has no description in the docs #19210

Closed

jorisvandenbossche mentioned this issue Jan 24, 2020

Raising SettingWithCopyWarning in cross_validate scikit-learn/scikit-learn#16191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document is_copy #18799

document is_copy #18799

amueller commented Dec 15, 2017

jreback commented Dec 15, 2017

amueller commented Dec 16, 2017 via email

amueller commented Dec 16, 2017 via email

jorisvandenbossche commented Dec 18, 2017

jreback commented Dec 18, 2017

amueller commented Dec 18, 2017 •

edited

Loading

document is_copy #18799

document is_copy #18799

Comments

amueller commented Dec 15, 2017

Code Sample, a copy-pastable example if possible

Problem description

jreback commented Dec 15, 2017

amueller commented Dec 16, 2017 via email

amueller commented Dec 16, 2017 via email

jorisvandenbossche commented Dec 18, 2017

jreback commented Dec 18, 2017

amueller commented Dec 18, 2017 • edited Loading

amueller commented Dec 18, 2017 •

edited

Loading