Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance with large worksheets #16

Open
willtrnr opened this issue Apr 6, 2019 · 5 comments
Open

Improve performance with large worksheets #16

willtrnr opened this issue Apr 6, 2019 · 5 comments

Comments

@willtrnr
Copy link
Owner

willtrnr commented Apr 6, 2019

This will serve as an umbrella issue for performance improvement.

Currently there is a bit of copying which could potentially be avoided with BIFF record reading and there's also the possibility of using a C extension (or Cython).

@hpca01
Copy link

hpca01 commented Aug 2, 2019

hi, not sure if u are still developing this lib, i wrote a Rust extension to parse XLSB using Calamine. Am currently working on CFFI and using milksnake to interpolate between the two. Would you be interested in integrating it into your module?
Performance wise it takes about 10-20 seconds to copy ~ 3 million cells to csv as a pure binary CLI.

CLI should work on any 32 bit windows environment. Not sure what your official compatibility requirements are but usually compiled binaries work on whatever platforms they were compiled on(given you had the correct toolchain). So if you have access to a linux, mac, and windows env, theoretically you can embed 3 different binary files to do the parsing.

@willtrnr
Copy link
Owner Author

willtrnr commented Aug 2, 2019

@hpca01 that sounds interesting, however I've been putting off native modules to avoid having to compile and distribute binaries (which in the case of OSX, I won't even be able to test).

Gotta say I think it's an interesting feature to have a pure python implementation since non-cPython interpreters just work (i.e. Jython and IronPython).

I think the majority of the process should like that, but we could have optional compiled modules for certain parts (think cPickle vs pickle).

@hpca01
Copy link

hpca01 commented Aug 2, 2019

@wwwiiilll yeah, i figured the compiled binary wouldn't be ideal. However dylibs shouldn't require compilation across different OS, only difference being 32bit vs 64bit python versions. When I get the CFFI bits working as expected, it will behave like a module in python. I'll shoot u a msg when i figure this C-ABI stuff out with Rust, and you can decide to incorporate it if you want.

@chfw
Copy link

chfw commented Aug 2, 2019

@wwwiiilll , I think I can put myself forward to test on OSX but I am dummy in compiling it. So if @hpca01 and @wwwiiilll would like to offer some help, I can join it.

@willtrnr
Copy link
Owner Author

willtrnr commented Aug 3, 2019

I really appreciate the sentiment guys, but I'm a little hesitant about requiring potential developers to have the rust toolchain and, hell, rust knowledge on hand to work on this (though I can't say this has happened so far)

But, I guess, if you can come up with a module that is optional and can be built with the usual setup.py I'd gladly try to incorporate it. I believe the pain point that could be addressed with a native module is record parsing, there's quite a bit of bytes copying going on there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants