Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use memoryview in unpack_frames #3980

Merged
merged 2 commits into from
Jul 22, 2020

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Jul 21, 2020

As part of unpack_frames, we slice out each frame we'd like to extract (see code snippet below).

frame = b[start:end]

However this causes a copy, which increases memory usage and creates a notable bottleneck when unpacking frames. Closer inspection of unpack_frames shows this dominates the time of that function and takes up roughly half of the time in deserialize_bytes. Also as deserialize_bytes typically works with a bytes object, these frames end up being bytes objects, which we wind up needing to copy later to produce mutable frames ( see PR #3967 and related context ). IOW performing a copy in unpack_frames is wasted effort.

To fix this issue, we coerce the input of unpack_frames to a memoryview. This means slicing later merely produces views onto the data, which is essentially free. This avoids the copy and alleviates this bottleneck. Also this just works in most Python calls (like struct.unpack_from) as they are bytes-like compatible so work on memoryviews. The details can be seen in the benchmark below using deserialize_bytes part of the unspilling code path, which calls into unpack_frames. This speeds up the unspilling code path by ~50%.



Before:

In [1]: import numpy 
   ...: import pandas 
   ...: from distributed.protocol import serialize_bytelist, deserialize_bytes                                                 

In [2]: df = pandas.DataFrame({ 
   ...:     k: numpy.random.random(1_000_000) 
   ...:     for i, k in enumerate(map(chr, range(ord("A"), ord("K")))) 
   ...: })                                                                                                                     

In [3]: b = b"".join(serialize_bytelist(df))                                                                                   

In [4]: %timeit deserialize_bytes(b)                                                                                           
37.5 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

After:

In [1]: import numpy 
   ...: import pandas 
   ...: from distributed.protocol import serialize_bytelist, deserialize_bytes                                                 

In [2]: df = pandas.DataFrame({ 
   ...:     k: numpy.random.random(1_000_000) 
   ...:     for i, k in enumerate(map(chr, range(ord("A"), ord("K")))) 
   ...: })                                                                                                                     

In [3]: b = b"".join(serialize_bytelist(df))                                                                                   

In [4]: %timeit deserialize_bytes(b)                                                                                           
19.2 ms ± 188 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Selecting out each frame from the input causes a copy, which increases
memory usage and slows down `unpack_frames`. To fix this, coerce the
input to a `memoryview`. This way slices into the `memoryview` only take
a view onto the underlying data, which is quite fast and doesn't result
in additional memory usage.
@mrocklin
Copy link
Member

Nice. +1

@quasiben
Copy link
Member

That is so cool! Thanks @jakirkham ! And again, thank you for providing timing reports as well.

@quasiben quasiben merged commit eb10a53 into dask:master Jul 22, 2020
@jakirkham jakirkham deleted the use_memoryview_unpack_frames branch July 22, 2020 00:22
@jakirkham
Copy link
Member Author

Thanks all! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants