Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop extra type check in _extract_serialize #4281

Merged

Conversation

jakirkham
Copy link
Member

Currently _extract_serialize performs a type check on its arguments and then when walking through any collections it performs type checks on any values it encounters. If any of those values are collections, it effectively performs the same type check twice. Thus for more deeply nested structures, these additional type checks will add up for every layer. To fix this, we perform a type check on the arguments provided to extract_serialize and do a bit of preparation on the arguments to _extract_serialize. Additionally when looping over values in _extract_serialize, we leverage this one type check to do any other preparation we may need to do with that info. As a result we are able to eliminate this overhead.

Here's a rough benchmark showing the effect of this change:

Before:
In [1]: from distributed.protocol.serialize import extract_serialize

In [2]: data = 1_000_000 * b"abc"
   ...: msg = 11 * [10 * [2 * [5 * [data]]]]

In [3]: %timeit extract_serialize(msg)
943 µs ± 39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
After:
In [1]: from distributed.protocol.serialize import extract_serialize

In [2]: data = 1_000_000 * b"abc"
   ...: msg = 11 * [10 * [2 * [5 * [data]]]]

In [3]: %timeit extract_serialize(msg)
786 µs ± 9.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Comment on lines -461 to +453
x2.extend(repeat(None, len(x)))
x2 = len(x) * [None]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it is worth noting that this change of list construction is significantly faster on its own. By reducing the type checks performed, we are able to take advantage of this improvement by creating the list as we intend without filling up an existing list later.

In [1]: %%timeit l = []
   ...: l.extend(repeat(None, 100))
   ...: 
   ...: 
641 ns ± 7.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [2]: %%timeit l = []
   ...: l = 100 * [None]
   ...: 
   ...: 
348 ns ± 10.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@jakirkham jakirkham force-pushed the drop_xtra_typ_chk__extract_serialize branch from 23181f2 to 0dc7cfb Compare November 25, 2020 18:11
@jakirkham jakirkham force-pushed the drop_xtra_typ_chk__extract_serialize branch from 0dc7cfb to d9eaa0a Compare November 25, 2020 18:20
@mrocklin
Copy link
Member

Fine by me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants