WIP: Avoid recursion in `_extract_serialize` #4258

jakirkham · 2020-11-20T04:16:43Z

This rewrites _extract_serialize to avoid recursion. Does this by making use of a collections.deque to track all of the nested dicts and lists that still need to be visited. A pass through each dict or list is completed before passing through any nested ones. Otherwise it behaves mostly like a depth-first search. These are all looped through until no more can be found. Then it returns to extract_serialize.

Instead of recursively calling `_extract_serialize`, simply maintain a queue of collections to visit. When starting out, pop the first collection from the queue and process it. As we encounter new collections, add them to the queue to visit later. This way we still have the behavior as if we were recursing, but we lower the overhead by not having to track a deep stack with many things that don't need to be included.

jakirkham · 2020-11-25T18:40:30Z

The type check improvements themselves are useful. So have broken that out as PR ( #4281 ).

That said, eliminating recursion here doesn't seem to have much impact. At this point just calling to_serialize is more expensive than anything else here, which isn't saying much.

Going to go ahead and close this out. If things change, we can always revisit.

mrocklin · 2020-11-25T18:40:34Z

distributed/protocol/serialize.py

+
+
+def _extract_serialize(x_items, x2, ser, bytestrings, path=()):
+    q = deque()


If we're using this as only a stack then maybe this should be a list rather than a deque

Yeah I tried both. This performed better based on the benchmarking I did. That said, it seems that recursion itself is already handled quite efficiently.

jakirkham mentioned this pull request Nov 20, 2020

20 iterations with environment level profiling quasiben/dask-scheduler-performance#15

Open

jakirkham added 2 commits November 25, 2020 10:19

Drop extra type check in _extract_serialize

d9eaa0a

jakirkham force-pushed the avoid_recurse_extract_serialize branch from 51a3077 to 41f4a0c Compare November 25, 2020 18:25

jakirkham closed this Nov 25, 2020

mrocklin reviewed Nov 25, 2020

View reviewed changes

jakirkham deleted the avoid_recurse_extract_serialize branch November 25, 2020 18:40

jakirkham mentioned this pull request Nov 25, 2020

line_profiler results on 4 workers (w/o stealing) over 20 iterations quasiben/dask-scheduler-performance#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Avoid recursion in `_extract_serialize` #4258

WIP: Avoid recursion in `_extract_serialize` #4258

jakirkham commented Nov 20, 2020

jakirkham commented Nov 25, 2020

mrocklin Nov 25, 2020

jakirkham Nov 25, 2020



		def _extract_serialize(x_items, x2, ser, bytestrings, path=()):
		q = deque()

WIP: Avoid recursion in _extract_serialize #4258

WIP: Avoid recursion in _extract_serialize #4258

Conversation

jakirkham commented Nov 20, 2020

jakirkham commented Nov 25, 2020

mrocklin Nov 25, 2020

Choose a reason for hiding this comment

jakirkham Nov 25, 2020

Choose a reason for hiding this comment

WIP: Avoid recursion in `_extract_serialize` #4258

WIP: Avoid recursion in `_extract_serialize` #4258