Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: exporter concurrent blockstore get #237

Closed
wants to merge 1 commit into from

Conversation

alanshaw
Copy link
Member

@alanshaw alanshaw commented Jul 21, 2022

Here's a super quick hack I did to prove the usefulness of #236

In this PR we just concurrently fetch ALL the blocks from the blockstore if the requested byte range is the entirety of the file.

This is likely not what you want to merge, but I just wanted to see something similar working in practice.

In the 1st video, bitswap is having to send a message with a single block each time. We get ~40 blocks in ~30s:

before.mov

In the 2nd video there's a delay before receiving blocks but then we get 172 in rapid succession after ~20s. I think the delay before receiving blocks in the second video is the server side fetching the blocks from it's blockstore (clearly not very perfromantly!):

after.mov

@alanshaw
Copy link
Member Author

P.S. I was not able to npm install in this repo after a fresh clone or run any tests...

Screenshot 2022-07-21 at 23 11 57

@achingbrain
Copy link
Member

In the 2nd video there's a delay before receiving blocks

If the code behind the video is the code in this PR, the delay might be due to the use of Promise.all here - it'll wait to load all 174 blocks before processing any of them - looks like you fixed that up in your fork though 👍

I've tidied up the idea here in #249 - it doesn't have a concurrency parameter right now - trivial to add but it's also limited by the number of DAGLinks a given DAGNode can have so maybe it's not necessary?


Something I wanted to explore with this sort of optimisation is to switch it around to use getMany (though I think the return type needs to change to AsyncIterable<Pair<Key, Value>> first) - that would simplify the code here and shift the concurrency control to the blockstore, so for example if your blockstore is talking to a rate-limited API you might want to limit overall concurrency there, or not at all if it's in-memory.

achingbrain added a commit that referenced this pull request Aug 17, 2022
A polishing of #237.  Uses `it-parallel` to load a whole list of children of a DAG node in parallel rather than one at a time.  Makes fetching large files much faster.
github-actions bot pushed a commit that referenced this pull request Aug 17, 2022
## [ipfs-unixfs-exporter-v8.0.2](ipfs-unixfs-exporter-v8.0.1...ipfs-unixfs-exporter-v8.0.2) (2022-08-17)

### Bug Fixes

* parallelise loading of dag-pb links when exporting ([#249](#249)) ([862d63b](862d63b)), closes [#237](#237)
@achingbrain
Copy link
Member

Closing in favour of #249 - thanks for opening this!

@achingbrain achingbrain deleted the perf/exporter-concurrent-blockstore-get branch August 26, 2022 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants