-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ISO 8601 Duration string constructor for Timedelta #19065
Conversation
pandas/_libs/tslibs/timedeltas.pyx
Outdated
Match a provided string against an ISO 8601 pattern, providing a group for | ||
each ``Timedelta`` component. | ||
""" | ||
pater = re.compile(r"""P |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you prob want to compile this pattern in a module level variable to avoid repeatedly doing this.
pandas/_libs/tslibs/timedeltas.pyx
Outdated
@@ -506,6 +526,33 @@ def _binary_op_method_timedeltalike(op, name): | |||
# ---------------------------------------------------------------------- | |||
# Timedelta Construction | |||
|
|||
def _value_from_iso_match(match): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make cdef and type this as int64_t
pandas/_libs/tslibs/timedeltas.pyx
Outdated
match_dict = {k: int(v) for k, v in match_dict.items()} | ||
nano = match_dict.pop('nanoseconds', 0) | ||
|
||
return nano + convert_to_timedelta64(timedelta(**match_dict), 'ns') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just return the nanoseconds here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow - don't we need to the timedelta64 to get the nanosecond precision of all the components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to use timedelta_from_spec and construct the number of nanoseconds, then simply return that
pandas/_libs/tslibs/timedeltas.pyx
Outdated
@@ -506,6 +526,33 @@ def _binary_op_method_timedeltalike(op, name): | |||
# ---------------------------------------------------------------------- | |||
# Timedelta Construction | |||
|
|||
def _value_from_iso_match(match): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this parse_iso_format_string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it shouldn't take the match, but the string itself.
pandas/_libs/tslibs/timedeltas.pyx
Outdated
match = match_iso_format(value) | ||
value = _value_from_iso_match(match) | ||
else: | ||
value = np.timedelta64(parse_timedelta_string(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should look like
if len(value) > 0 and vaue[0] == 'P':
value = parse_iso_format(value)
else:
value = parse_timedelta_string(value)
value = np.timedelta64(value)
('P0DT0H0M0.000000123S', Timedelta(nanoseconds=123)), | ||
('P0DT0H0M0.00001S', Timedelta(microseconds=10)), | ||
('P0DT0H0M0.001S', Timedelta(milliseconds=1)), | ||
('P0DT0H1M0S', Timedelta(minutes=1))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some invalid matches that should raise (prob need to catch inside the parsing function and raise a nice message)
Codecov Report
@@ Coverage Diff @@
## master #19065 +/- ##
=======================================
Coverage 91.51% 91.51%
=======================================
Files 148 148
Lines 48680 48680
=======================================
Hits 44550 44550
Misses 4130 4130
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a hand written parser is better (though certainly more work), so maybe in the future
for k, v in match_dict.items(): | ||
ns += timedelta_from_spec(v, '0', k) | ||
|
||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should just raise here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an expert on Cython but pytest was failing when raising the exception inside of this function. I believe it is attributable to the int64_t function declaration. I could remove that type and raising the exception directly inside the function would work. Otherwise, I was getting the below output during testing.
-------------------------------------------------------------- Captured stderr call --------------------------------------------------------------
AttributeError: 'NoneType' object has no attribute 'groupdict'
Exception ignored in: 'pandas._libs.tslibs.timedeltas.parse_iso_format_string'
AttributeError: 'NoneType' object has no attribute 'groupdict'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, you just need to add:
except? -1:
to the declaration; this tells cython that you may raise in a cdef
function so it should check. you don't need to explicity return -1 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK thanks. Will make changes and re-push
pandas/_libs/tslibs/timedeltas.pyx
Outdated
value = np.timedelta64(parse_timedelta_string(value)) | ||
if len(value) > 0 and value[0] == 'P': | ||
iso_val = parse_iso_format_string(value) | ||
if iso_val == -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic is much simpler if you raise above
pandas/_libs/tslibs/timedeltas.pyx
Outdated
Returns | ||
------- | ||
ns: int64_t | ||
Precision in nanoseconds of matched ISO 8601 duration, or -1 if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a Raises section instead
thanks! |
Just seeing this and #11375 now. In #11375 (comment) @jorisvandenbossche mentioned these would be better represented as I think it's still useful to support this as a format for parsing timedeltas, but we may want to put some warnings / caveats in the documentation. |
git diff upstream/master -u -- "*.py" | flake8 --diff
ASV results below