-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Support parsing <Month Name> <Day number> e.g. Jan 1 in date utilities #11430
Comments
Never mind. Never documented. |
@jreback commented here mwaskom/seaborn#702 (comment) it was never meant to be supported |
Further, @jreback, I thought it was maybe not a written rule, but still somewhat generally assumed that pandas did fall back to |
I think 0.17 behavior is consistent, but allowing to pass |
Another related case (also no (full) date part provided in the string) from #16074
so depending on the format of the time string, this does or does not work, so there is at least some inconsistency. But I think we mainly have to decide on whether, if a part of the date (eg the year) or the full date is missing, do we fill it with 0001-01-01 (and with result that it raises an error) or with the current date? |
@jreback thanks for linking the duplicate. Would it be reasonable to use the dateutil parser and allow the user to pass a default datetime? |
@mikedeltalima you can do that as a user |
@jreback I'm not sure I understand. You can specify the default in dateutil, but not pandas. Why shouldn't the user have that option? Also, dateutil chose the current date for their default default, but pandas could choose something else (Jan 1 of current year?). |
how is having a default date useful? |
Let's say I have a Series (a column in a DataFrame) that consists of dates like April 5, May 10 etc. None of them specify the year. If I want to capture that information, I need to convert the column using to_datetime, but datetimes need the year specified. Why force the user to implement a (costly?) transform after the fact? |
still not sure what you mean |
Just to be sure it is clear it is clear for everybody:
When it becomes more strange is when filling in the current day of the month:
(today is the 27th of April) So we can discuss whether we should follow that behaviour in pandas or not. In #7599 we decided to not follow that for at least the filling of the current day of the month (the last more strange example). The consequence is that we also do not follow the rule for filling the current year, at least for certain formats (this issue), a consequence which was not fully on purpose I think (or at least this is not discussed / tested in the original PR). Filling with current year as dateutil does, has at least some usecase I think, but if you want this, you can always directly use the
Jeff, note that we currently actually still do that in specific cases, depending on the format of the string (see my example above #11430 (comment)) In any case, the current situation is also not ideal, as certain cases still, and I think the error message should not be an OutOfBounds error, but an error message indicating that the string could not be parsed because (part of) the date was missing. |
@jorisvandenbossche thanks for the explanation! You've really improved the readability of the conversation :) Could you elaborate on the workaround? This is what I could do, but I wonder if there are good reasons to use to_datetime instead if I can fix it.
That said, @jreback I don't think my suggestion involves pandas guessing anything. It would simply allow the user to override a default (which pandas is already using). As you can see from the last two lines in my example above, there seems to be a bug that does not allow pandas to recognize some datetimes as the correct dtype. Looks like the cutoff is September 21, 1677. :)
|
Regarding the 1677, the reason for that is very simple, once you know it: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#timeseries-timestamp-limits When you try to parse a string outside of the supported range, you get an OutOfBounds error, but if you pass a |
@jorisvandenbossche so proposed fix: pick a default that is within the bounds of the timestamp limits (Jan 1 of current year up to pd.Timestamp.max) and allow users to pass a default (raise exception if out of bounds). |
so you want to add an argument to the constructor of so
I suppose we could implement logic, or simply pass thru to |
@jreback is that necessary? I was only thinking of adding the argument to the |
by-definition when thinks get down to use so making this more explicit with a default value makes sense. |
strong -1 on adding extra arguments to datetime parsing, it's fine for this to error |
Agreed here. I think whatever dateutil can parse as a string is sufficient at this point so closing |
I assume that this was officially supported before. Haven't narrowed it down any more than sometime between 0.16.2 and 0.17.0.
The text was updated successfully, but these errors were encountered: