Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise NotImplemented for date parsing args in read_excel #11544 #11870

Merged
merged 1 commit into from
Dec 19, 2015
Merged

raise NotImplemented for date parsing args in read_excel #11544 #11870

merged 1 commit into from
Dec 19, 2015

Conversation

grahamjeffries
Copy link
Contributor

Fixes #11544

The parse_dates and date_parser args are passed to TextReader and then to TextFileReader where they don't seem to have an effect. It was decided to raise the exception at the _parse_excel level however, following suit with the handling of chunksize args.

@jreback jreback added Error Reporting Incorrect or improved errors from pandas IO Excel read_excel, to_excel labels Dec 19, 2015
@jreback jreback added this to the 0.18.0 milestone Dec 19, 2015
@@ -340,4 +340,8 @@ Bug Fixes
- Bug in ``Index`` prevents copying name of passed ``Index``, when a new name is not provided (:issue:`11193`)

- Bug in ``read_excel`` failing to read any non-empty sheets when empty sheets exist and ``sheetname=None`` (:issue:`11711`)

- Bug in ``read_excel`` failing to raise NotImplemented error when `parse_dates` and `date_parser` are provided (:issue:`11544`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double backticks are NotImplementedError and say: keywords parse_dates and date_parser

@jreback
Copy link
Contributor

jreback commented Dec 19, 2015

lgtm. minor comments. ping when updated & green.

@grahamjeffries
Copy link
Contributor Author

Good to go. I also updated the NotImplemented message for the chunksize keyword.

jreback added a commit that referenced this pull request Dec 19, 2015
raise NotImplemented for date parsing args in read_excel #11544
@jreback jreback merged commit a050a33 into pandas-dev:master Dec 19, 2015
@jreback
Copy link
Contributor

jreback commented Dec 19, 2015

thanks!

@jorisvandenbossche
Copy link
Member

AFAIK, the parse_dates keyword for read_excel does work.
It only works a bit in unexpected ways sometimes, as it is a keyword to specify whether to parse strings to a datetime. And most of the time, if you have dates in an excel file, the column is formatted as dates, the engine knows this and the column will be of datetime dtype regardless of what you specify with parse_dates. Hence the confusion.
But if you have a column that is formatted as text in excel, parse_dates works perfectly to parse these strings into a datetime64 column.

So I think the more correct fix is to clarify the documentation. And maybe also warn on parse_dates=True when you didn't specify index_col

cc @TomAugspurger I just saw you giving that comment on SO: http://stackoverflow.com/questions/34403682/why-does-pandas-parse-dates-when-parse-dates-false

@jorisvandenbossche
Copy link
Member

As a small example it actually does something:

In [43]: dti = pd.date_range('2014-1-1', periods=10)

In [44]: df = pd.DataFrame({'dates':dti, 'strings':pd.Series(dti).astype(str)})

In [46]: df.to_excel('test.xlsx', index=False)

In [47]: pd.read_excel('test.xlsx').dtypes
Out[47]:
dates      datetime64[ns]
strings            object
dtype: object

In [48]: pd.read_excel('test.xlsx', parse_dates=['strings']).dtypes
Out[48]:
dates      datetime64[ns]
strings    datetime64[ns]
dtype: object

@TomAugspurger
Copy link
Contributor

Thanks, I misunderstood how read_excel handled parse_dates.

On Dec 22, 2015, at 5:14 AM, Joris Van den Bossche notifications@github.com wrote:

AFAIK, the parse_dates keyword for read_excel does work.
It only works a bit in unexpected ways sometimes, as it is a keyword to specify whether to parse strings to a datetime. And most of the time, if you have dates in an excel file, the column is formatted as dates, the engine knows this and the column will be of datetime dtype regardless of what you specify with parse_dates. Hence the confusion.
But if you have a column that is formatted as text in excel, parse_dates works perfectly to parse these strings into a datetime64 column.

So I think the more correct fix is to clarify the documentation. And maybe also warn on parse_dates=True when you didn't specify index_col

cc @TomAugspurger I just saw you giving that comment on SO: http://stackoverflow.com/questions/34403682/why-does-pandas-parse-dates-when-parse-dates-false


Reply to this email directly or view it on GitHub.

@grahamjeffries grahamjeffries deleted the bugfix-11544 branch December 23, 2015 17:01
@grahamjeffries
Copy link
Contributor Author

Thanks for the clarification. I'll submit another PR with doc fixes and a warning on parse_dates=True and no index_col.

@grahamjeffries
Copy link
Contributor Author

See PR #12051

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants