Skip to content
This repository has been archived by the owner on Jul 21, 2023. It is now read-only.

fix: use utils.mapParallel for parallel processing of peers #166

Merged
merged 1 commit into from
Jun 16, 2020

Conversation

kumavis
Copy link
Contributor

@kumavis kumavis commented Dec 12, 2019

Fixes #164

There's still one more place where we handle things serially: handling incoming messages for a single peer. I decided not to touch it. It could be addressed with paramap-it, likely with ordered: false option

Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some performance bottleneck that this is addressing? What are the performance characteristics of this new code? I'm not against parallelizing things but I'd want to be sure we're genuinely addressing problems and not causing the dht to spin our CPUs because of unbounded parallelism.

for await (const item of asyncIterator) {
tasks.push(asyncFn(item))
}
return Promise.all(tasks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to wait for asyncIterator to end before parallel mapping over the items it yields. This is a little unusual and potentially buffers a large amount of data in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to wait for asyncIterator to end before parallel mapping over the items it yields

i dont understand what you mean. that is neither a requirement we have nor the behavior of this code.

each item starts its asyncFn immediately when yielded. handling of each asyncFn does not block further iteration. In both my uses all I needed was to know when all have completed, so I returned a Promise.all

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah you’re correct - my apologies I didn’t notice the function call!

@alanshaw
Copy link
Member

Worth mentioning and maybe using here because it has concurrency limits:
https://www.npmjs.com/package/streaming-iterables#parallelmap

@kumavis
Copy link
Contributor Author

kumavis commented Dec 13, 2019

Is there some performance bottleneck that this is addressing?

not an experienced perf bottlneck, but this was trying to fix an architectural regression I noticed where network activity became done in serial while it was previously done in parallel. (though previously it waited for all peers before processing).

adding a concurrency limit seems reasonable

I expect performance to be blocked primarily by network i/o, so eagerly getting messages out seems like the correct answer, and doing it in series seems like a very questionable choice. If its intentionally time inefficient, it should have a comment explaining that.

@kumavis
Copy link
Contributor Author

kumavis commented Dec 13, 2019

Here's a quick experiment testing mapParallel perf vs the previous behavior. if iterating and the task take the same amount of time, its 2x fast. Significantly faster if asyncFn is slower. Marginally faster if iterator is slower.
https://gist.github.com/kumavis/3032e17c351e4e1f4874922d0481de09

@kumavis
Copy link
Contributor Author

kumavis commented Dec 13, 2019

I guess one concern is if asyncFn rejects before we call Promise.all, it would be considered and unhandled promise

@vasco-santos
Copy link
Member

I guess one concern is if asyncFn rejects before we call Promise.all, it would be considered and unhandled promise

what if we use p-queue? we can also control concurrency

@vasco-santos
Copy link
Member

Hey @kumavis . Should we get this to the finish line? Or include it when we work on updating this to conform with the new updates in the go implementation of the dht?

@kumavis
Copy link
Contributor Author

kumavis commented May 29, 2020

work is done, hit that merge. or add your own commits if you want the p-queue

@vasco-santos
Copy link
Member

FYI I just rebased this PR and waiting on CI

@vasco-santos vasco-santos merged commit 534a2d9 into master Jun 16, 2020
@vasco-santos vasco-santos deleted the mapParallel branch June 16, 2020 09:08
Copy link
Member

@vasco-santos vasco-santos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I will consider adding the p-queue when we refactor the dht

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

for-await handled serially
3 participants