-
-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Excel date format #977
Comments
Hi @PyGeek03. Do you mind giving us some examples of inputs and expected outputs? |
These examples are from the Microsoft documentation page:
An interesting issue is that for compatibility with Lotus 1-2-3, Excel considers 1900 to be a leap year, which is wrong. So the difference between the 2 date systems is not 4 years, but 4 years 1 day. |
Hi @PyGeek03, I was looking a little more into the documentation. Is the input ever a fraction, or float? I just want to make sure there aren't any weird edge cases I'm missing before I propose a solution. Some of our dependencies don't play nicely with floats. |
From the source code of xlrd.xldate, that seems possible. I reckon we can just write some wrapper functions around xlrd.xldate, and add it as a dependency? |
Hey @PyGeek03. I was playing around with some possible solutions and it looks like we could just create a wrapper that uses our built-in shift function with a start date of either January 1, 1900 (or January 1, 1904). Both xlrd and Arrow use datetime as dependencies, so the behaviour from Arrow would be presumably the same as you are experiencing with xlrd. Also, this way avoids creating an additional dependency for Arrow. Here is a pretty basic example of how this would work.
@jadchaar what are your thoughts? The implementation of this would be very similar to how xlrd does it (https://github.com/python-excel/xlrd/blob/master/xlrd/xldate.py#L130). Also, our built-in shift function does support fractions/float inputs, so this wouldn't be an issue. |
I think a native shift solution would work nicely. Since this is a niche use case, we probably want to avoid adding an additional dependency. |
Wow Arrow is even cooler than I thought. Great work guys!
…On Sat, May 22, 2021 at 12:02 PM Jad Chaar ***@***.***> wrote:
Hey @PyGeek03 <https://github.com/PyGeek03>. I was playing around with
some possible solutions and it looks like we could just create a wrapper
that uses our built-in shift function with a start date of either January
1, 1900 (or January 1, 1904). Both xlrd and Arrow use datetime as
dependencies, so the behaviour from Arrow would be presumably the same as
you are experiencing with xlrd. Also, this way avoids creating an
additional dependency for Arrow.
Here is a pretty basic example of how this would work.
import arrow
base_date = arrow.get('1900-01-01')
print(base_date)
new_date = base_date.shift(days=1)
print(new_date)
@jadchaar <https://github.com/jadchaar> what are your thoughts? The
implementation of this would be very similar to how xlrd does it (
https://github.com/python-excel/xlrd/blob/master/xlrd/xldate.py#L130).
Also, our built-in shift function does support fractions/float inputs, so
this wouldn't be an issue.
I think a native shift solution would work nicely. Since this is a niche
use case, we probably want to avoid adding an additional dependency.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#977 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD2SR53Q7MFPOBMJSCZVPLLTO4JVHANCNFSM442Q2L5Q>
.
|
Feature Request
In Excel files, date is represented as an integer that counts how many days have passed from either January 1st, 1900 (on Windows) or January 1st, 1904 (on Mac).
The function xlrd.xldate.xldate_as_datetime from the xlrd library is usually used for parsing this date format in Python. Considering Arrow's goal of providing a common API for date parsing, I believe this would be a useful feature for users who have to wrangle multiple date formats in the same column of an Excel file (because of non-standard data entry).
The only issue is, I'm not sure if this format should be supported as a token to use with arrow.get() and arrow.format() functions (and what that token should be), or if it should be its own function. I'm happy to work on this once we've decided on whether to implement this and how to solve this issue.
The text was updated successfully, but these errors were encountered: