-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement trickledag for faster unixfs operations #713
Conversation
(i think that if you rebase on master, that error will be fixed) |
I think the right thing to do with all these datastructures is to setup a benchmark suite that tests various different types of workloads. it may be that we find one or two that are really different datastructures will be better for different use cases. re-indexing the same data blocks might be fine to have "different handles" on the same content. |
397ec6d
to
51d8c6d
Compare
t.Fatal(err) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add some benchmarks to this pkg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, these tests only test it from the outside. It would be useful to test the implementation actually creates a well-formed structure. Maybe add a test that checks the structure produced?
Few comments, otherwise LGTM |
2eb2965
to
1e93ee0
Compare
implement trickledag for faster unixfs operations
Alright, so ive come up with a new tree structure optimized for both streaming AND seeking through a given file. This improves both upon the ext4 structure (Which is mainly aimed at on disk filesystems) and the "List of Lists" idea i previously commented about.
The downside of the ext4 style tree layout was that, as you got farther into the file, the number of requests you need to make in order to get data increases, I noticed this problem and came up with the "List of Lists" layout, which would work fantastically for a sequential stream, the issue though, comes when you try to seek through it, the top level node is very poorly weighted to one side so that its 'narrow' from the data's perspective, thus seeking through requires O(n) requests to find the desired location in the file, where ext4 was roughly O(log(n)).
The Trickle{Tree,Dag} addresses both of these concerns, each request after the first can return actual file data, and the cost of seeking remains near O(log(n)) since it has a recursive tree structure. A visualization of it would look like the ext4 tree, but instead of having iteratively deeper 'balanced' trees, it has an iteratively deeper version of itself. The primary tenet of its design is "Data at every layer"
An example layout is here:
http://gateway.ipfs.io/ipfs/QmRPfwo1XQErHDXpeCnJ7j92ibGNTBxkrmBFCbvEa78gZB