-
Notifications
You must be signed in to change notification settings - Fork 60
fix: use utils.mapParallel for parallel processing of peers #166
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some performance bottleneck that this is addressing? What are the performance characteristics of this new code? I'm not against parallelizing things but I'd want to be sure we're genuinely addressing problems and not causing the dht to spin our CPUs because of unbounded parallelism.
for await (const item of asyncIterator) { | ||
tasks.push(asyncFn(item)) | ||
} | ||
return Promise.all(tasks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to wait for asyncIterator
to end before parallel mapping over the items it yields. This is a little unusual and potentially buffers a large amount of data in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to wait for asyncIterator to end before parallel mapping over the items it yields
i dont understand what you mean. that is neither a requirement we have nor the behavior of this code.
each item starts its asyncFn
immediately when yielded. handling of each asyncFn
does not block further iteration. In both my uses all I needed was to know when all have completed, so I returned a Promise.all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah you’re correct - my apologies I didn’t notice the function call!
Worth mentioning and maybe using here because it has concurrency limits: |
not an experienced perf bottlneck, but this was trying to fix an architectural regression I noticed where network activity became done in serial while it was previously done in parallel. (though previously it waited for all peers before processing). adding a concurrency limit seems reasonable I expect performance to be blocked primarily by network i/o, so eagerly getting messages out seems like the correct answer, and doing it in series seems like a very questionable choice. If its intentionally time inefficient, it should have a comment explaining that. |
Here's a quick experiment testing mapParallel perf vs the previous behavior. if iterating and the task take the same amount of time, its 2x fast. Significantly faster if asyncFn is slower. Marginally faster if iterator is slower. |
I guess one concern is if asyncFn rejects before we call Promise.all, it would be considered and unhandled promise |
what if we use |
Hey @kumavis . Should we get this to the finish line? Or include it when we work on updating this to conform with the new updates in the go implementation of the dht? |
work is done, hit that merge. or add your own commits if you want the p-queue |
8d66b2a
to
12847a3
Compare
FYI I just rebased this PR and waiting on CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I will consider adding the p-queue
when we refactor the dht
Fixes #164
There's still one more place where we handle things serially: handling incoming messages for a single peer. I decided not to touch it. It could be addressed with
paramap-it
, likely withordered: false
option