Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas should accept ISO 8601 duration #11375

Closed
femtotrader opened this issue Oct 19, 2015 · 15 comments
Closed

Pandas should accept ISO 8601 duration #11375

femtotrader opened this issue Oct 19, 2015 · 15 comments
Labels
Enhancement Frequency DateOffsets IO Data IO issues that don't fit into a more specific label Timedelta Timedelta data type

Comments

@femtotrader
Copy link

Hello,

ISO 8601 defines duration.
https://en.wikipedia.org/wiki/ISO_8601#Durations

see also usage in XML Schema http://www.w3.org/TR/xmlschema-2/#d0e11648

Duration can't be defined as timedelta (or timedelta64) because

  • months don't always have 30 days
  • years don't always have 365 days

It will be nice if Pandas could accept ISO 8601 duration.

We should probably use pd.tseries.offsets.DateOffset like

pd.tseries.offsets.DateOffset(years=2, months=6, days=1)

Kind regards

@jreback
Copy link
Contributor

jreback commented Oct 19, 2015

not sure what you are intending here. The offsets are usually what you use for things like this. As they handle days/month/year counting propertly. OTOH, Timedeltas are more for fixed intervals (the parser is fairly flexible, I suppose you could add months/years, but might invite user confusion as these are fixed intervals).

@jreback jreback added Timedelta Timedelta data type Frequency DateOffsets labels Oct 19, 2015
@femtotrader
Copy link
Author

This issue is just an enhancement request to have a Pandas method to parse ISO 8601 duration strings

We could have for example:

pd.tseries.offsets.parse_duration("P3Y6M4DT12H30M5S")

(or something similar)

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

This is pretty trivial actually. Just need to add a bit to the unit map (e.g. accept DT for days), the P and Y as well.

In [4]: pd.Timedelta("4D12H30M5S")
Out[4]: Timedelta('4 days 12:30:05')

want to do a pull-request?

@jreback jreback added Enhancement IO Data IO issues that don't fit into a more specific label labels Oct 20, 2015
@jreback jreback added this to the Next Major Release milestone Oct 20, 2015
@femtotrader
Copy link
Author

I don't feel very confortable with regex

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

its not a regext (anymore). Its all in cython see here

@femtotrader
Copy link
Author

Sorry I don't know cython (maybe one day I will!)

Moreover I don't understand how you will be able to add a pd.Timedelta with DateOffset(s) because

pd.Timedelta("4D12H30M5S") + pd.tseries.offsets.DateOffset(years=3, months=6)

raises

TypeError: unsupported type for add operation

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

you wouldn't add a Timedelta directly with an offset. These are 2 different types of things. What you are asking for is an absolute fixed interval which is what a Timedelta is. IOW months are defined to be 30 days, etc.

an offset is a more flexible as it knows its frequency and such.

what you are asking is for an extension of the Timedelta format.

@femtotrader
Copy link
Author

Why not constructing in such a case a DateOffset using

pd.tseries.offsets.DateOffset(years=3, months=6, days=4, hours=12, minutes=30, seconds=5)

we can use years with Timedelta

pd.Timedelta(years=3)

raises

ValueError: cannot construct a TimeDelta from the passed arguments, allowed keywords are [weeks, days, hours, minutes, seconds, milliseconds, microseconds, nanoseconds]

but that's probably a misunderstanding from my side.

When I add a 1 month duration to a date I don't want to add (necessarily) 30 days.

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

@femtotrader exactly, this why for your usecase you almost certainly want to use a DateOffset.

(and that's why I didn't implement years/months on a Timedelta as they are actually not very important BECAUSE they are not fixed intervals).

notice that days,hours,seconds....etc are fixed intervals.

@jorisvandenbossche
Copy link
Member

@jreback If I read the explanation of the ISO 8601, I think what they describe as durations, should be considered as DateOffsets in pandas-world, and not timedelta's

@jorisvandenbossche
Copy link
Member

to say, I don't think it makes sense to support it in the pd.Timedelta parsing

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

@jorisvandenbossche yep. We don't support parsing into offsets at all (except for single ones). So this would be a major undertaking, and IMHO not that beneficial).

@jorisvandenbossche jorisvandenbossche modified the milestones: Someday, Next Major Release Oct 20, 2015
@femtotrader
Copy link
Author

ISO 8601:2004(E) (Third edition 2004-12-01) can be found here http://dotat.at/tmp/ISO_8601-2004_E.pdf

@chris-b1
Copy link
Contributor

It probably wouldn't be so bad to support something like DateOffset.from_iso8601_duration as it looks like the definition maps pretty 1-1 with a relativedelta.

I guess then all the other offsets would inherit that method though. Seems like it may have been to have something like an ABCDateOffset class rather than everything inheriting from DateOffset which has a pretty specific implementation.

@WillAyd
Copy link
Member

WillAyd commented Mar 23, 2018

@jreback randomly came across this but should be able to close as part of #15136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Frequency DateOffsets IO Data IO issues that don't fit into a more specific label Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

5 participants