-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix mutation of DTI backing Series/DataFrame #24096
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
assert ser._values is not dti | ||
assert dti[1] == ts | ||
|
||
def test_dt64tz_setitem_does_not_mutate_dti(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is not a series.test_block_internals; not sure if there is somewhere else this might belong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls create one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Does this fix #6326 as well? |
It looks like it does not. This also breaks another copy/view-based test, which I'll need to look into. |
Codecov Report
@@ Coverage Diff @@
## master #24096 +/- ##
==========================================
+ Coverage 42.52% 42.52% +<.01%
==========================================
Files 161 161
Lines 51697 51699 +2
==========================================
+ Hits 21982 21986 +4
+ Misses 29715 29713 -2
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24096 +/- ##
==========================================
- Coverage 92.2% 92.2% -0.01%
==========================================
Files 162 162
Lines 51714 51719 +5
==========================================
+ Hits 47682 47686 +4
- Misses 4032 4033 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see any tests for the OP issue here (just the freq change), which AFIACT is orthogonal
pandas/core/frame.py
Outdated
@@ -494,6 +495,11 @@ def _init_dict(self, data, index, columns, dtype=None): | |||
arrays.loc[missing] = [v] * missing.sum() | |||
|
|||
else: | |||
for key in data: | |||
if isinstance(data[key], ABCIndexClass): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? we need to deep copy everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can be more specific and just make this ABCDatetimeIndex
, but I don't see any solution that doesn't involve making a copy here.
The first thing I tried was to make a copy in Block.setitem
(circa line 923 in internals.blocks) if we had a DatetimeIndex
. But that breaks a test because it means when we call ser.__setitem__
(or ser.loc.__setitem__
, or...), we fail to change other Series that are views on ser
.
AFAICT the only way to do do this is to avoid ever having a DatetimeIndex
and a Series
/column share data. This may be simpler to resolve after Blocks are backed by DatetimeArray
instead of DatetimeIndex
(I'm not 100% sure on this), but ATM this is a blocker to #24074, which is itself a blocker to DatetimeArray
. We could avoid having this be a blocker for #24074 if you were OK with the pickle workaround currently there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, the pickle workaround is much worse. for now let's just do this for DTI w/tz only. otherwise this may be a lot of copying. Pls add a TODO around this though.
pandas/core/series.py
Outdated
@@ -189,6 +190,9 @@ def __init__(self, data=None, index=None, dtype=None, name=None, | |||
else: | |||
# need to copy to avoid aliasing issues | |||
data = data._values.copy() | |||
if isinstance(data, ABCDatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really?? this is a specific problem to Datetime w/tz, this is too heavy handed
assert ser._values is not dti | ||
assert dti[1] == ts | ||
|
||
def test_dt64tz_setitem_does_not_mutate_dti(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls create one
Each of the tests has an assertion near the end |
lgtm. ping on green. |
Ping |
thanks! |
commit 28c61d770f6dfca6857fd0fa6979d4119a31129e Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:18:19 2018 -0600 uncomment commit bae2e322523efc73a1344464f51611e2dc555ccb Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:17:09 2018 -0600 maybe fixes commit 6cb4db05c9d6ceba3794096f0172cae5ed5f6019 Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:57:37 2018 -0600 we back commit d97ab57fb32cb23371169d9ed659ccfac34cfe45 Merge: a117de4 b78aa8d Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:51:51 2018 -0600 Merge remote-tracking branch 'upstream/master' into disown-tz-only-rebased2 commit b78aa8d Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:18:44 2018 -0500 REF/TST: Add pytest idiom to reshape/test_tile (pandas-dev#24107) commit 2993b8e Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:17:55 2018 -0500 REF/TST: Add more pytest idiom to scalar/test_nat (pandas-dev#24120) commit b841374 Author: evangelineliu <hsiyinliu@gmail.com> Date: Wed Dec 5 18:21:46 2018 -0500 BUG: Fix concat series loss of timezone (pandas-dev#24027) commit 4ae63aa Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:44:50 2018 -0800 Implement DatetimeArray._from_sequence (pandas-dev#24074) commit 2643721 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:43:45 2018 -0800 CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116) commit 8ea7744 Author: chris-b1 <cbartak@gmail.com> Date: Wed Dec 5 14:21:23 2018 -0600 PERF: ascii c string functions (pandas-dev#23981) commit cb862e4 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 12:19:46 2018 -0800 BUG: fix mutation of DTI backing Series/DataFrame (pandas-dev#24096) commit aead29b Author: topper-123 <contribute@tensortable.com> Date: Wed Dec 5 19:06:00 2018 +0000 API: rename MultiIndex.labels to MultiIndex.codes (pandas-dev#23752)
Fixes (at least some) verify_integrity bugs discussed in #24074.
git diff upstream/master -u -- "*.py" | flake8 --diff