Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Avoid recursion in _extract_serialize #4258

Closed

Conversation

jakirkham
Copy link
Member

This rewrites _extract_serialize to avoid recursion. Does this by making use of a collections.deque to track all of the nested dicts and lists that still need to be visited. A pass through each dict or list is completed before passing through any nested ones. Otherwise it behaves mostly like a depth-first search. These are all looped through until no more can be found. Then it returns to extract_serialize.

Instead of recursively calling `_extract_serialize`, simply maintain a
queue of collections to visit. When starting out, pop the first
collection from the queue and process it. As we encounter new
collections, add them to the queue to visit later. This way we still
have the behavior as if we were recursing, but we lower the overhead by
not having to track a deep stack with many things that don't need to be
included.
@jakirkham jakirkham force-pushed the avoid_recurse_extract_serialize branch from 51a3077 to 41f4a0c Compare November 25, 2020 18:25
@jakirkham
Copy link
Member Author

The type check improvements themselves are useful. So have broken that out as PR ( #4281 ).

That said, eliminating recursion here doesn't seem to have much impact. At this point just calling to_serialize is more expensive than anything else here, which isn't saying much.

Going to go ahead and close this out. If things change, we can always revisit.

@jakirkham jakirkham closed this Nov 25, 2020


def _extract_serialize(x_items, x2, ser, bytestrings, path=()):
q = deque()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're using this as only a stack then maybe this should be a list rather than a deque

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I tried both. This performed better based on the benchmarking I did. That said, it seems that recursion itself is already handled quite efficiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants