Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle dataframes for fast pandas reading. #85

Closed
wants to merge 1 commit into from

Conversation

dhimmel
Copy link
Member

@dhimmel dhimmel commented Mar 1, 2017

This pull request adds a step to the data download process which creates pickled versions of TSVs. These pickles are much faster than the TSVs to read into pandas. Reduces file reading time from minutes to seconds.

Unfortunately the pickle files are big (~1.2 GB) and pandas does not support pickle compression yet. Otherwise, I would have uploaded the compressed pickles using git fls. We still could, but I think it's easier to locally generate these pickles than download 2 extra gigabytes.

@dhimmel
Copy link
Member Author

dhimmel commented Apr 25, 2017

@NIkota here is where I got to before pandas 0.20.0 was available. You can take over from here (feel free to use some of the changes here or not).

Pandas 0.20.0 will support compressed pickles but will not support compressed pickles from URL: pandas-dev/pandas#13317 (comment). You still may be able to use requests to download the compressed bytes and pass that to read pickle. The ideal situation is that users can quickly download a smallish file (< 100 MB) and read it into pandas in under ~10 seconds.

@rdvelazquez
Copy link
Member

@dhimmel can you post the compressed pickle files and share the URL so I can take a crack at this?

@rdvelazquez
Copy link
Member

@dhimmel are you ok if I close this PR for now? (For housekeeping purposes)

@dhimmel
Copy link
Member Author

dhimmel commented Aug 28, 2017

are you ok if I close this PR for now?

Yes! Thanks for helping with the maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants