Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame/Series.tz_convert with copy=False modifies original data #6326

Closed
hendrics opened this issue Feb 12, 2014 · 7 comments · Fixed by #24657
Closed

DataFrame/Series.tz_convert with copy=False modifies original data #6326

hendrics opened this issue Feb 12, 2014 · 7 comments · Fixed by #24657
Labels
Bug Timezones Timezone data dtype

Comments

@hendrics
Copy link

Hi. Not sure if it is a bug, or something which needs to be clarified.

Consider the code

s = pd.Series(np.arange(0,5), index=pd.date_range('20131027', periods=5, freq='1H', tz='Europe/Berlin'))
s.tz_convert("UTC", copy=False)

s index is still the same as before. If i do the same for frames.

d = pd.DataFrame(s)
d.tz_convert("UTC", copy=False)

This time index of d has changed. From the code it is not clear if DataFrame is doing the right thing either.

So is it a bug or is it just inconsistent, or is it an intention?

Update:

In [1]: s = pd.Series(np.arange(0,5), index=pd.date_range('20131027', periods=5, freq='1H', tz='Europe/Berlin'))
      s.tz_convert("UTC", copy=False)
Out[1]: 
2013-10-26 22:00:00+00:00    0
2013-10-26 23:00:00+00:00    1
2013-10-27 00:00:00+00:00    2
2013-10-27 01:00:00+00:00    3
2013-10-27 02:00:00+00:00    4
Freq: H, dtype: int32

In [2]: s
Out[2]: 
2013-10-27 00:00:00+02:00    0
2013-10-27 01:00:00+02:00    1
2013-10-27 02:00:00+02:00    2
2013-10-27 02:00:00+01:00    3
2013-10-27 03:00:00+01:00    4
Freq: H, dtype: int32

In [3]: d = pd.DataFrame(s)
      d.tz_convert("UTC", copy=False)
Out[3]: 
2013-10-26 22:00:00+00:00 0
2013-10-26 23:00:00+00:00 1
2013-10-27 00:00:00+00:00 2
2013-10-27 01:00:00+00:00 3
2013-10-27 02:00:00+00:00 4

In [214]: d
Out[214]: 
2013-10-26 22:00:00+00:00 0
2013-10-26 23:00:00+00:00 1
2013-10-27 00:00:00+00:00 2
2013-10-27 01:00:00+00:00 3
2013-10-27 02:00:00+00:00 4
@jreback
Copy link
Contributor

jreback commented Feb 12, 2014

looks ok in 0.13.1..

In [1]: pd.__version__
Out[1]: '0.13.1'

In [2]: s = pd.Series(np.arange(0,5), index=pd.date_range('20131027', periods=5, freq='1H', tz='Europe/Berlin'))

In [3]: s.tz_convert("UTC", copy=False)
Out[3]: 
2013-10-26 22:00:00+00:00    0
2013-10-26 23:00:00+00:00    1
2013-10-27 00:00:00+00:00    2
2013-10-27 01:00:00+00:00    3
2013-10-27 02:00:00+00:00    4
Freq: H, dtype: int64

In [4]: d = pd.DataFrame(s)

In [5]: d.tz_convert("UTC", copy=False)
Out[5]: 
                           0
2013-10-26 22:00:00+00:00  0
2013-10-26 23:00:00+00:00  1
2013-10-27 00:00:00+00:00  2
2013-10-27 01:00:00+00:00  3
2013-10-27 02:00:00+00:00  4

[5 rows x 1 columns]

@alexchamberlain
Copy link

If you inspect s. it hasn't changed under 0.13.1.

>>> s = pd.Series(np.arange(0,5), index=pd.date_range('20131027', periods=5, freq='1H', tz='Europe/Berlin'))
>>> s.tz_convert(None, copy=False)
2013-10-26 22:00:00    0
2013-10-26 23:00:00    1
2013-10-27 00:00:00    2
2013-10-27 01:00:00    3
2013-10-27 02:00:00    4
Freq: H, dtype: int32
>>> s
2013-10-27 00:00:00+02:00    0
2013-10-27 01:00:00+02:00    1
2013-10-27 02:00:00+02:00    2
2013-10-27 02:00:00+01:00    3
2013-10-27 03:00:00+01:00    4
Freq: H, dtype: int32
>>> 

@jreback
Copy link
Contributor

jreback commented Feb 12, 2014

why would you expect s to change? most pandas methods return a new object

the copy flag is just tries not to actually copy the index if it doesn't need to; in this case it does so its irrelevant

@hendrics
Copy link
Author

DataFrame does change though. If you inspect d it will have a new index. It might be that index is part of the data in the DataFrame, it's just the behaviour is inconsistent.

Updated the comment above with the output.

@jreback
Copy link
Contributor

jreback commented Feb 12, 2014

ahh..ok....will mark as a bug...thanks for the report

@jreback jreback added Bug and removed Can't Repro labels Feb 12, 2014
@jreback jreback added this to the 0.14.0 milestone Feb 12, 2014
@jreback
Copy link
Contributor

jreback commented Feb 18, 2014

@hendrics

pls run this again on master....I am pretty sure this is fixed (and if you want to add an explicty test for this, would be gr8)

see here: b1687b8

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Apr 9, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@mroeschke mroeschke changed the title tz_convert with copy=False behaves differently and unexpectedly for Series and DataFrame DataFrame.tz_convert with copy=False modifies original data Jul 26, 2018
@mroeschke
Copy link
Member

mroeschke commented Jan 4, 2019

The Series case is actually wrong now as well.

In [13]: s = pd.Series(np.arange(0,5), index=pd.date_range('20131027', periods=5, freq='1H', tz='Europe/Berlin'))

In [14]: s
Out[14]:
2013-10-27 00:00:00+02:00    0
2013-10-27 01:00:00+02:00    1
2013-10-27 02:00:00+02:00    2
2013-10-27 02:00:00+01:00    3
2013-10-27 03:00:00+01:00    4
Freq: H, dtype: int64

In [15]: s.tz_convert('UTC', copy=False)
Out[15]:
2013-10-26 22:00:00+00:00    0
2013-10-26 23:00:00+00:00    1
2013-10-27 00:00:00+00:00    2
2013-10-27 01:00:00+00:00    3
2013-10-27 02:00:00+00:00    4
Freq: H, dtype: int64

In [16]: s
Out[16]:
2013-10-26 22:00:00+00:00    0
2013-10-26 23:00:00+00:00    1
2013-10-27 00:00:00+00:00    2
2013-10-27 01:00:00+00:00    3
2013-10-27 02:00:00+00:00    4
Freq: H, dtype: int64

In [17]: pd.__version__
Out[17]: '0.24.0.dev0+1505.gcb31b2b09.dirty'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Needs Tests Unit test(s) needed to prevent regressions good first issue labels Jan 4, 2019
@mroeschke mroeschke changed the title DataFrame.tz_convert with copy=False modifies original data DataFrame/Series.tz_convert with copy=False modifies original data Jan 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants