Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzing the DAG reader #5192

Closed
schomatis opened this issue Jul 5, 2018 · 4 comments
Closed

Analyzing the DAG reader #5192

schomatis opened this issue Jul 5, 2018 · 4 comments
Assignees
Labels
topic/docs-ipfs Topic docs-ipfs topic/technical debt Topic technical debt topic/UnixFS Topic UnixFS

Comments

@schomatis
Copy link
Contributor

It is comprised of the group of structures in the unixfs.io package capable of doing a DFS graph traversal of a file DAG. It reads (and performs similar operations) in the same sequence the DAG was built, so the importer is a good reference (once #5118 gets merged) to better understand it. I leave some notes here on what could be improved to make it more accessible to the new reader.

The recursive approach is hidden behind the PBDagReader (parent) and its ReadSeekCloser (child) attribute. The first doesn't call the second, instead it interacts with the interface which the second implements. The interface, although it provides a nice generalization, it obscures the very simple algorithm to traverse a file DAG that only has internal ProtoNodes and leaf RawNodes (represented at the UnixFS layer as File and Raw types). This could be made more explicit to leverage the concepts that the reader could already have of what a file DAG is (e.g., if it's already familiarized with the importer). Could an iterative algorithm help clarify how the reader operates? Instead of the implicit call stack with the PBDagReaders at different depths of the tree (making up the path to the current leaf being read) we could have an explicit stack with the information of each edge of the path at the different depths (this stack would pretty much be the reader itself with a few other attributes of control).

Node promises. Many nodes are being requested at the same time (preloaded) to streamline the graph traversal, this mechanism is abstracted through the ipld.NodePromise structure and it should be made very clear to the reader what a promise is, why is it used (see #5162 (comment)). The name of the promises attribute of the PBDagReader should be changed to reflect that those are actually the child nodes (that will be visited in the future), to focus on what they represent instead of how those nodes are fetched (through promises).

The direct manipulation of the protobuf unixfs.pb.Data structure in PBDagReader.pbdata should be replaced with the more refined unixfs.FSNode structure (which contains unixfs.pb.Data).

Remove unused node attribute (*mdag.ProtoNode) of the PBDagReader.

As said before the role of the buffer (buf attribute) may be misleading, it should be clear that this is the child node one level below being accessed by another PBDagReader (or BufDagReader in case we've reached the leaf). Using the ReadSeekCloser interface instead of DagReader (which contains it) is correct but may be confusing if not clarified well enough. (Revisit this point.)

The WriteTo method has almost an identical logic as CtxReadFull but its code is structured differently (especially the error handling part) which makes it harder to assimilate what it does (or more precisely, how it does it).

Read and Seek (from io.SeekStart) do very similar traversal operations, the first one counting sizes and the second reading file content (actually both are performing reading operations on UnixFS nodes), could that part of the code be merged? Maybe a new structure could be introduced (at the unixfs level) that would handle how to traverse a DAG, this structure could receive a method of what to do in each node it visits during the DFS (something like DFSTraversalFileDAG or DFSFileDAG).

The name io of the package that contains it is a bit misleading, the DAG modifier that would normally be also considered as standard I/O has its own mod package, maybe the DAG reader could be moved to a separate reader package and rethink what the io package should contain (if anything).

@schomatis schomatis added topic/docs-ipfs Topic docs-ipfs topic/technical debt Topic technical debt topic/UnixFS Topic UnixFS labels Jul 5, 2018
@schomatis schomatis added this to the Files API Documentation milestone Jul 5, 2018
@schomatis schomatis self-assigned this Jul 5, 2018
@schomatis
Copy link
Contributor Author

Forgot to mention that ipfs cat seems like the best entry point command to follow the code path that uses the DAG reader methods. Maybe an example can be added here or directly in #5052.

@schomatis
Copy link
Contributor Author

https://github.com/ipfs/go-ipfs/blob/a9efa7e201707bd1a0d48fe1146a76ec0097db92/unixfs/io/pbdagreader.go#L60

I'm unclear in what case that buf's contents are being used and aren't instead replaced inside the precalcNextBuf call, is it for the base (leaf node) case?

@schomatis
Copy link
Contributor Author

The DAG reader has been completely reworked in ipfs/go-unixfs#60.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/docs-ipfs Topic docs-ipfs topic/technical debt Topic technical debt topic/UnixFS Topic UnixFS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant