Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

period_range creates wrong dates when freq has multiple offsets #13730

Closed
achabotl opened this issue Jul 20, 2016 · 10 comments
Closed

period_range creates wrong dates when freq has multiple offsets #13730

achabotl opened this issue Jul 20, 2016 · 10 comments
Labels
Bug Period Period data type
Milestone

Comments

@achabotl
Copy link

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> pd.__version__
u'0.18.0'
>>> pd.period_range('2016-07-20', periods=4, freq='2H')
PeriodIndex(['2016-07-20 00:00', '2016-07-20 02:00', '2016-07-20 04:00',
             '2016-07-20 06:00'],
            dtype='int64', freq='2H')
>>> pd.period_range('2016-07-20', periods=4, freq='2H30min')
PeriodIndex(['1970-10-11 08:48', '1970-10-11 08:50', '1970-10-11 08:52',
             '1970-10-11 08:54'],
            dtype='int64', freq='150T')
>>> pd.period_range('2016-07-20', periods=4, freq='2H30T')
PeriodIndex(['1970-10-11 08:48', '1970-10-11 08:50', '1970-10-11 08:52',
             '1970-10-11 08:54'],
            dtype='int64', freq='150T')
>>> pd.period_range('2016-07-20', periods=2, freq='1D1H')
PeriodIndex(['1971-12-10 10:00', '1971-12-10 11:00'], dtype='int64', freq='25H')

date_range works fine:

>>> pd.date_range('2016-07-20', periods=4, freq='2H30T')
DatetimeIndex(['2016-07-20 00:00:00', '2016-07-20 02:30:00',
               '2016-07-20 05:00:00', '2016-07-20 07:30:00'],
              dtype='datetime64[ns]', freq='150T')

Expected Output

The freq is right, but I'd expect the PeriodIndex to start at 2016-07-20, not in 1970, and to display 2h30 increments, something like:

PeriodIndex(['2016-07-20 00:00', '2016-07-20 02:30', '2016-07-20 05:00',
             '2016-07-20 07:30'],
            dtype='int64', freq='150T')

Or, if freq that combine multiple offsets are note supported, to at least raise and error.

output of pd.show_versions()

INSTALLED VERSIONS


commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.2
setuptools: 23.1.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.1
statsmodels: 0.6.1
xarray: 0.7.2
IPython: 4.1.2
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.3
blosc: 1.2.8
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.4
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.12
pymysql: 0.6.7.None
psycopg2: None
jinja2: 2.8
boto: 2.40.0

@jorisvandenbossche
Copy link
Member

Confirmed on master, thanks for reporting!

@jorisvandenbossche jorisvandenbossche added Bug Period Period data type labels Jul 21, 2016
@jorisvandenbossche jorisvandenbossche added this to the Next Major Release milestone Jul 21, 2016
@jorisvandenbossche
Copy link
Member

For now, of course, you can use pd.period_range('2016-07-20', periods=4, freq='150T').

Before 0.17 this raised as multiples of a freq were not supported before (#7832). When adding that feature probably not checked for combinations.

cc @sinhrks

@sinhrks
Copy link
Member

sinhrks commented Jul 21, 2016

Yeah, Period can't support combinasions as it is under current impl. Should raise or coerce to single freq.

@jorisvandenbossche
Copy link
Member

@sinhrks In some way it already does coerce the combination correctly to a single freq, as in the output of the examples you see 150T

@sinhrks
Copy link
Member

sinhrks commented Jul 21, 2016

@jorisvandenbossche yes "150T" should work after #7832. What I tried to mean is coercing "2H30T" to "150T" internally.

@jorisvandenbossche
Copy link
Member

Yep, I understood that :-) What I meant is that somewhere this coercing already happens (only not in the right place for letting this work), since:

In [44]:  pd.period_range('2016-07-20', periods=4, freq='2H30min').freq
Out[44]: <150 * Minutes>

gives the correct freq, only not the correct values

@sinhrks
Copy link
Member

sinhrks commented Jul 21, 2016

Ah i see... i haven't understood the phenomenon.

@agraboso
Copy link
Contributor

agraboso commented Jul 31, 2016

A question about this has been posted on StackOverflow today.

An interesting observation made there is that

pd.period_range(start='2016-01-01 10:00', freq = '1H1D', periods = 10)

gives the correct (see @sinhrks's comment below) output

PeriodIndex(['2016-01-01 10:00', '2016-01-01 11:00', '2016-01-01 12:00',
             '2016-01-01 13:00', '2016-01-01 14:00', '2016-01-01 15:00',
             '2016-01-01 16:00', '2016-01-01 17:00', '2016-01-01 18:00',
             '2016-01-01 19:00'],
            dtype='int64', freq='25H')

while

pd.period_range(start='2016-01-01 10:00', freq = '1D1H', periods = 10)

(notice the reversal of the freq string) does not.

PeriodIndex(['1971-12-02 01:00', '1971-12-02 02:00', '1971-12-02 03:00',
             '1971-12-02 04:00', '1971-12-02 05:00', '1971-12-02 06:00',
             '1971-12-02 07:00', '1971-12-02 08:00', '1971-12-02 09:00',
             '1971-12-02 10:00'],
            dtype='int64', freq='25H')

@sinhrks
Copy link
Member

sinhrks commented Jul 31, 2016

@agraboso Your example both look incorrect. 1st one should have 25H freq rather than 1H. 1st and 2nd should output the same result.

@agraboso
Copy link
Contributor

@sinhrks You're right, of course. I looked at the first element and the hour on the rest and thought it was fine, but it is not.

@jreback jreback modified the milestones: 0.19.0, Next Major Release Aug 2, 2016
@jreback jreback closed this as completed in 81819b7 Aug 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Period Period data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants