-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast (parallel) Traversal For A DAG That Stops At Arbitrary Points #5487
Comments
Hey @hannahhoward, I'm currently working on a PR (ipfs/go-ipld-format#39) that has many similarities with the requirements you're describing but it still has a long way to go to meet all of them. It's basically a structure to walk DAGs providing a If you have the time it would be great if you could take a look at it and tell me what you think it. I'm always interested in finding use cases besides the original motivation for it that was decoupling some of the DAG reader logic (https://github.com/ipfs/go-unixfs/pull/12/files), since I would like to arrive at an API that may be useful in more cases than just that one. |
Also, welcomed to the team! 🎉 |
@schomatis thank you for the pointer -- I need to review it more in full and will give more feedback shortly. Have you also seen the code here: https://github.com/ipfs/go-merkledag/blob/master/merkledag.go#L363 This code specifically deals with (abstract? - can't tell) DAG traversal but seems to have good parallelism -- though I think without the level of control over how traversal is done in your code. I'm trying to figure out what needs to be used where. |
Yes, Anyway, there are many parts in the code base where similar DAG traversing logic is taking place and the idea would be to abstract them in a single API (where reasonable) to simplify the code. |
Yea I need to leave a comment on your PR -- cause I've review the code and it's really awesome. @eingenito + @michaelavila think so too. For the in-order case, and also for general tree walking it's great. The big challenge here is we're just trying to get nodes as fast as possible in the use case of LS -- which requires making requests for a nodes links as soon as we have them and in parallel as much as possible -- effectively we want to be walking the tree in multiple directions in an effort to discover as many blocks to fetch as we can as quickly as possible. And then if order becomes a requirement, deal with that through sorting later. Anyway, I will put a comment on your PR and I think there is still potential to incorporate it for this use case in the future. |
Yes, agreed, actually @hsanjuan had already proposed something like this in #5257 (comment), I'll open an issue about it. |
@hannahhoward this is moot now, right? |
Summary
Given a root CID, I would like to traverse that node's DAG quickly (making requests for children in parallel), and stop traversal of links on a per node basis using logic I provide.
Use Case
The time to list large sharded directories is prohibitive because link traversal in the DAG is serialized. We need to speed this up. (#4908 )
Requirements / Acceptance Criteria
Requirements:
First Steps
Additional Optimizations
Not Included
The text was updated successfully, but these errors were encountered: