-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Don't error in pd.to_timedelta when errors=ignore #13832
Conversation
try: | ||
return Series(values, index=arg.index, | ||
name=arg.name, dtype='m8[ns]') | ||
except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this try/except needs to be inside _convert_listlike
, e.g. If you have an invalid list/Index this will still not work (e.g. you are handling scalar/series only)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternately, you could leave _convert_listlike
and wrapit IT in another function if that is simpler / easier to follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what you said in the first comment. Please provide an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this try/except at all. values
should already be dtyped (it could of course be m8[ns]
or an Index or whatever). This make it very confusing. The fact that you had to write a comment is bad code smell here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was to clarify that the dtype
of the returned array may not be convertible to m8[ns]
if for example errors='ignore'
, and there's an error in the parsing. What about it is confusing? I was initially confused (before this PR) why we assumed we could cast to m8[ns]
like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's my point we don't do this in to_datetime and its wildly inconsistent to do it here when they have the same API
the return value needs to be validated before Series construction
that is the issue here / this is exactly what array\to_datetime does and I think is now inconsistent with array_to_timedelta
no reason for them to be different in how they act
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback: IMHO your attempt to keep everything consistent is somewhat in vain. Notice that in to_datetime
we don't specify dtype
, whereas in to_timedelta
, we do. This means that regardless of what _convert_listlike
returns in to_datetime
, we can still return a Series
. The same cannot be said for to_timedelta
, where we also stipulate dtype=m8[ns]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gfyoung if you have to do this, there is an embedded bug in the Series constructor. Pls don't catch errors just because its easy, its not correct.
The convert_list_like MUST return a valid index/scalar (if box=True), or a valid array/list/scalar if not, apparently its not in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback : What do you mean "valid"? It's no different from _convert_listlike
in to_datetime
. There is no bug in the Series
constructor. It's just that the case of no conversion being done was not considered. I struggle to understand what you are disagreeing with here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback : If you remove the dtype=m8[ns]
, no try-except
will be needed. It is that the dtype
specification alone that causes us to do this. There is nothing "invalid" about what _convert_listlike
returns. The behaviour is exactly the same for to_datetime
.
Current coverage is 85.28% (diff: 100%)@@ master #13832 diff @@
==========================================
Files 139 139
Lines 50211 50230 +19
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 42819 42837 +18
- Misses 7392 7393 +1
Partials 0 0
|
you need to add the list/Index tests as I indicated. Further the problem is the separation of concerns here. the |
What are you talking about? Please read my comments first. I moved all of the error handling out of Cython for reasons that I outlined above. |
that's not how Timestamp works - it needs to back |
Again, I have no idea what you are saying. |
5435014
to
a8adc88
Compare
@jreback : Any updates or replies to my comments? |
a8adc88
to
49b8d67
Compare
@gfyoung so my issue with this is the following. in the current code all handling of The main reason of course is that by-definition So while your approach seems ok, it now changes the way a reader has to understand what is going on (in timedelta vs timestamp parsing). So I am open to changing things, just want to make them consistent. Also I think its much easier to follow if its all in cython. No real reason to move it. |
@jreback : The major issue with the current implementation was that there was significant coupling between parsing and error behaviour. For example, take a look at this line here. Why should Secondly, there was no real error handling at the Cython level (this bug attests to that), and IMO that that shouldn't be black-boxed into Cython but rather exposed at the Python level like we do with |
@gfyoung pls re-read what I wrote. you are changing code that is very similar in
not sure what you mean by this. I am happy to have this work, but it has to be consistent code. |
@jreback : I'm not sure why it matters so much that the format be the same here. What's the justification for keeping it? Here is my justification for writing it the way I did. My point is that the conversion process should be separated into two parts: 1) The actual conversion, and 2) How we handle errors. The current implementation of In addition, the conversion process becomes littered with The framework I have proposed here is exactly what happens in |
@gfyoung as I said. I am happy to change. BUT it needs to be for BOTH |
@jreback : Again, I don't understand why this is such a big deal. But if you insist... |
@gfyoung I am not saying the code in |
@jreback : fair enough, but for org purposes, can the refactoring of |
sure, but do it on top of this one. |
@jreback : Fair enough. In that case, this should be good to merge if there are no other concerns. |
@gfyoung right, but the point is I'd like to see the combined changes. don't get me wrong I am all for stripping things out of run an asv just to be sure in any event. |
@jreback : Another PR that is rebased on this one...fair enough. |
@gfyoung yes exactly. these are nice for a discrete series of changes. a bit more work, but allow one to work pretty easily. |
@jreback : So trying to push error handling onto the Python level for |
Also, there already is error handling at the Python level for |
value = TimedeltaIndex(value, unit='ns', name=name) | ||
return value | ||
if errors not in ('ignore', 'raise', 'coerce'): | ||
raise ValueError("errors must be one of 'ignore', " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. That was one of the first tests that I wrote.
ok added some comments. pls run an asv as well (not really sure we have enough asv converage though) |
@jreback : Ran |
Looks good to me. Regarding the asv benchs, I don't think there are currently benchmarks that covert the coerce/ignore behaviour, so maybe nice to add some. |
cdef: | ||
Py_ssize_t i, n | ||
ndarray[int64_t] iresult | ||
bint is_raise=errors=='raise', is_ignore=errors=='ignore', is_coerce=errors=='coerce' | ||
|
||
assert is_raise or is_ignore or is_coerce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to validate the errors kw here I think (with an assert is fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error keyword is validated in to_timedelta
itself I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. That's why I put the check in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure but this still could use an assert
this helps future readers know the possibilities and avoid misteps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Added a check in array_to_timedelta64
.
pls add some asvs, mainly for arrays of timedelta string parsing. This was previously in-line code, so like to see what happened. |
49b8d67
to
ef586b8
Compare
@jreback : There are already benchmarks for arrays of timedelta string parsing. However, I added one for bad parsing to exercise the |
the wrapping of the So I am -1 on merging until this is addressed. |
@jreback Can you explain what you think the bug is in the Series constructor? The try except block is to catch unconverted values, eg if you have
Is it here you see a bug in the Series constructor? I think the above seems correct. |
@jorisvandenbossche I have already explained this multiple times. This is not following the guarantees. |
Well, so it is at least not a bug in the Series constructor :-) (that's what I understood you said, https://github.com/pydata/pandas/pull/13832/files#r74401562) But, eg in the @gfyoung Is the |
@jorisvandenbossche : That would be an option to explore. I have been busy as of late but can take a look later and see if it can be removed it. But I should say that I agree 100% with what you are saying. These were points I was trying to make before to @jreback in our discussion about the wrapping. Also, the |
ef586b8
to
dc39205
Compare
@jorisvandenbossche , @jreback : It appears you can remove |
@jorisvandenbossche , @jreback : Travis agrees that the |
thanks @gfyoung |
Title is self-explanatory. Closes #13613.