Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: XLSB support in read_excel() #8540

Closed
kevindavenport opened this issue Oct 11, 2014 · 15 comments · Fixed by #29836 or #31215
Closed

Enhancement: XLSB support in read_excel() #8540

kevindavenport opened this issue Oct 11, 2014 · 15 comments · Fixed by #29836 or #31215
Labels
Enhancement IO Excel read_excel, to_excel
Milestone

Comments

@kevindavenport
Copy link

openpyxl and xlrd do not support XLSB. I'm curious if anyone has taken a look at integrating (more like creating) the functionality into Pandas. Looks like it could be a Python package in it self.

Spec from Microsoft:
http://msdn.microsoft.com/en-us/library/cc313133(v=office.12).aspx

@xlsb
Copy link

xlsb commented Oct 12, 2014

Based on https://github.com/python-excel/xlrd/issues/83 it seems that XLRD won't have XLSB support for a while (if ever)

The only liberally licensed tool for XLSB support is https://github.com/SheetJS/js-xlsx which uses JS but ships with a nodejs-powered script that can be run from the command line

@jreback jreback added the IO Excel read_excel, to_excel label Oct 13, 2014
@jreback jreback added this to the Someday milestone Oct 13, 2014
@kevindavenport
Copy link
Author

I found that link too, but I was hoping I could compile a command line utility that I would call from within Python instead of having to run a node server to execute the conversions.

@xlsb
Copy link

xlsb commented Oct 14, 2014

@kevindavenport you need to install node but don't need to run it as a server. It's like running a PHP script with the PHP CLI

@kevindavenport
Copy link
Author

Am I doing something wrong then by
$NodeJS/bin/node js-xlsx-master/bin/xlsx.njs
I get:

module.js:340
    throw err;
          ^
Error: Cannot find module 'jszip'
...
...

@xlsb
Copy link

xlsb commented Oct 14, 2014

@kevindavenport If you downloaded from source directly, you need to run npm install from the js-xlsx-master directory directly

If you run npm install -g xlsx, it creates a symlink /usr/local/bin/xlsx which you can use like:

$ xlsx test.xlsx Sheet1 
1,2,3
4,5,6
5,7,9

@velxundussa
Copy link

Would this library help in the implementation of this feature?

https://pypi.org/project/pyxlsb/

see the following solution in stack overflow:
https://stackoverflow.com/questions/45019778/read-xlsb-file-in-pandas-python?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

being able to do this directly from pandas would be great.

@wesm
Copy link
Member

wesm commented Jul 6, 2018

PRs would be welcome

@kevindavenport
Copy link
Author

kevindavenport commented Jul 21, 2018

I would love to take a crack - but whose endorsement do we get before spending the time trying to integrate pyxlsb natively into pandas read

@gfyoung
Copy link
Member

gfyoung commented Nov 14, 2018

@kevindavenport : Hey there, sorry that this conversation suddenly went dark. We are more than open to an implementation / PR at this point. If you have time / able, just go for it!

@talamb
Copy link

talamb commented Jul 6, 2019

@gfyoung : So would integration of pyxlsb as @velxundussa suggested be acceptable solution?

@gfyoung
Copy link
Member

gfyoung commented Jul 6, 2019

@talamb : If you can implement and submit as a PR, we will definitely take a look.

@WillAyd
Copy link
Member

WillAyd commented Jul 7, 2019

@talamb if interested in trying a PR you might want to take a look at #25427 and #25092 which added reading support for other formats. In a nutshell for this would want to copy the existing test files to .xlsb format, and add the appropriate parametrization in the test_readers.py module. Then subclass _BaseExcelReader and should fall into place

@talamb
Copy link

talamb commented Jul 7, 2019

@WillAyd Thanks! Should be seeing a PR from me in the near future.

@Rik-de-Kort Rik-de-Kort mentioned this issue Nov 25, 2019
5 tasks
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Dec 8, 2019
@praful-potphode
Copy link

is this issue fixed????

@gfyoung
Copy link
Member

gfyoung commented Jan 6, 2020

@praful-potphode We have a PR open (#29836) that is trying to address this issue. If you have any thoughts on pushing that PR forward, that would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants