Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow zero-copy decompression by allowing bytes-like input #230

Merged
merged 4 commits into from
Jul 31, 2020

Conversation

lgarrison
Copy link
Contributor

decompress() and decompress_ptr() currently require the compressed input to be a Python bytes object. bytes objects are nice because they're immutable, but this comes at the cost of copying the data when constructing the bytes. So, in Blosc, if you have compressed data in a bytesarray, or Numpy array, there's no way to decompress the data without incurring this cost. Since Blosc decompression can be nearly as fast as a memcpy(), this performance hit is noticeable.

This PR changes decompress() and decompress_ptr() to accept bytes-like input. Actually, at the CPython level, decompress() already allowed bytes-like input, but this was disallowed by the Python wrapper, so I just made that check a little more permissive.

I think this PR is a strict superset of the old behavior, and it passes make test on my local machine, but feedback is welcome!

@esc
Copy link
Member

esc commented Jul 30, 2020

@lgarrison thanks for submitting this! After reviewing your PR, I thought to myself, "oh this is great, but it will need some tests". So, I just had a look at the test suite, and it seems like this behavior may have already been supported:

https://github.com/Blosc/python-blosc/blob/master/blosc/test.py#L81-L92

Is there something, perhaps about recent Python versions that disallowed this behavior at the Python level?

@lgarrison
Copy link
Contributor Author

Nope, you're absolutely right, it already works with decompress(). I got mixed up with the tests I did on decompress_ptr(). The only changes I made to decompress() were in the docstring. I think decompress_ptr() still does need this PR in order to support buffers, though.

@esc
Copy link
Member

esc commented Jul 30, 2020

@lgarrison interesting. It also seems like there are no tests for decompress_ptr, if this PR is needed for decompress_ptr I would encourage you to add some tests for different types (memoryview etc...) to ensure that it can handle them too! Everything else checks out and I think this can be merged once these tests are in place! 👍 👍 👍

@lgarrison
Copy link
Contributor Author

I added test_decompress_ptr_input_types() to check that bytes, memoryview, bytearray, and Numpy arrays all work. I think there was a network timeout in the AppVeyor build, though.

@esc
Copy link
Member

esc commented Jul 30, 2020

It looks good to me!

@esc
Copy link
Member

esc commented Jul 30, 2020

Yes, the appveyor seems to have timed out.

@esc esc closed this Jul 30, 2020
@esc esc reopened this Jul 30, 2020
@esc
Copy link
Member

esc commented Jul 30, 2020

opened/closed to force CI retrigger.

@esc
Copy link
Member

esc commented Jul 30, 2020

@FrancescAlted do you think it would be OK to skip trying to fix appveyor and use the passing of the other platforms as an indicator?

@FrancescAlted
Copy link
Member

FrancescAlted commented Jul 31, 2020

@FrancescAlted do you think it would be OK to skip trying to fix appveyor and use the passing of the other platforms as an indicator?

Definitely. We have decided to start using Azure for CI for a while now, and the plan is to deprecate appveyor.

@esc
Copy link
Member

esc commented Jul 31, 2020

@FrancescAlted yay! 😁

@esc esc merged commit 0bfedea into Blosc:master Jul 31, 2020
@esc
Copy link
Member

esc commented Jul 31, 2020

@lgarrison thank you again for your contribution!

@lgarrison
Copy link
Contributor Author

Thanks @esc and @FrancescAlted !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants