Analyzing the DAG reader #5192

schomatis · 2018-07-05T13:17:07Z

It is comprised of the group of structures in the unixfs.io package capable of doing a DFS graph traversal of a file DAG. It reads (and performs similar operations) in the same sequence the DAG was built, so the importer is a good reference (once #5118 gets merged) to better understand it. I leave some notes here on what could be improved to make it more accessible to the new reader.

The recursive approach is hidden behind the PBDagReader (parent) and its ReadSeekCloser (child) attribute. The first doesn't call the second, instead it interacts with the interface which the second implements. The interface, although it provides a nice generalization, it obscures the very simple algorithm to traverse a file DAG that only has internal ProtoNodes and leaf RawNodes (represented at the UnixFS layer as File and Raw types). This could be made more explicit to leverage the concepts that the reader could already have of what a file DAG is (e.g., if it's already familiarized with the importer). Could an iterative algorithm help clarify how the reader operates? Instead of the implicit call stack with the PBDagReaders at different depths of the tree (making up the path to the current leaf being read) we could have an explicit stack with the information of each edge of the path at the different depths (this stack would pretty much be the reader itself with a few other attributes of control).

Node promises. Many nodes are being requested at the same time (preloaded) to streamline the graph traversal, this mechanism is abstracted through the ipld.NodePromise structure and it should be made very clear to the reader what a promise is, why is it used (see #5162 (comment)). The name of the promises attribute of the PBDagReader should be changed to reflect that those are actually the child nodes (that will be visited in the future), to focus on what they represent instead of how those nodes are fetched (through promises).

The direct manipulation of the protobuf unixfs.pb.Data structure in PBDagReader.pbdata should be replaced with the more refined unixfs.FSNode structure (which contains unixfs.pb.Data).

Remove unused node attribute (*mdag.ProtoNode) of the PBDagReader.

As said before the role of the buffer (buf attribute) may be misleading, it should be clear that this is the child node one level below being accessed by another PBDagReader (or BufDagReader in case we've reached the leaf). Using the ReadSeekCloser interface instead of DagReader (which contains it) is correct but may be confusing if not clarified well enough. (Revisit this point.)

The WriteTo method has almost an identical logic as CtxReadFull but its code is structured differently (especially the error handling part) which makes it harder to assimilate what it does (or more precisely, how it does it).

Read and Seek (from io.SeekStart) do very similar traversal operations, the first one counting sizes and the second reading file content (actually both are performing reading operations on UnixFS nodes), could that part of the code be merged? Maybe a new structure could be introduced (at the unixfs level) that would handle how to traverse a DAG, this structure could receive a method of what to do in each node it visits during the DFS (something like DFSTraversalFileDAG or DFSFileDAG).

The name io of the package that contains it is a bit misleading, the DAG modifier that would normally be also considered as standard I/O has its own mod package, maybe the DAG reader could be moved to a separate reader package and rethink what the io package should contain (if anything).

The text was updated successfully, but these errors were encountered:

schomatis · 2018-07-05T14:14:42Z

Forgot to mention that ipfs cat seems like the best entry point command to follow the code path that uses the DAG reader methods. Maybe an example can be added here or directly in #5052.

schomatis · 2018-07-11T15:42:48Z

Standardize the use of the PB and Buffer readers,

https://github.com/ipfs/go-ipfs/blob/a9efa7e201707bd1a0d48fe1146a76ec0097db92/unixfs/io/dagreader.go#L57-L58

https://github.com/ipfs/go-ipfs/blob/a9efa7e201707bd1a0d48fe1146a76ec0097db92/unixfs/io/pbdagreader.go#L118-L123

schomatis · 2018-07-11T15:47:16Z

https://github.com/ipfs/go-ipfs/blob/a9efa7e201707bd1a0d48fe1146a76ec0097db92/unixfs/io/pbdagreader.go#L60

I'm unclear in what case that buf's contents are being used and aren't instead replaced inside the precalcNextBuf call, is it for the base (leaf node) case?

schomatis · 2019-01-30T03:54:19Z

The DAG reader has been completely reworked in ipfs/go-unixfs#60.

schomatis added topic/docs-ipfs Topic docs-ipfs topic/technical debt Topic technical debt topic/UnixFS Topic UnixFS labels Jul 5, 2018

schomatis added this to the Files API Documentation milestone Jul 5, 2018

schomatis self-assigned this Jul 5, 2018

schomatis mentioned this issue Jul 5, 2018

pbdagreader: use FSNode instead of protobuf structure #5189

Merged

schomatis mentioned this issue Jul 10, 2018

Analyzing the DAG modifier #5211

Open

This was referenced Jul 11, 2018

unixfs: fix dagTruncate to preserve node type #5216

Merged

unixfs: refactor precalcNextBuf #5237

Merged

[WIP] unixfs: decouple the DAG traversal logic from the DAG reader #5257

Closed

schomatis mentioned this issue Sep 26, 2018

decouple the DAG traversal logic from the DAG reader ipfs/go-unixfs#12

Closed

4 tasks

schomatis closed this as completed Jan 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyzing the DAG reader #5192

Analyzing the DAG reader #5192

schomatis commented Jul 5, 2018

schomatis commented Jul 5, 2018

schomatis commented Jul 11, 2018

schomatis commented Jul 11, 2018

schomatis commented Jan 30, 2019

Analyzing the DAG reader #5192

Analyzing the DAG reader #5192

Comments

schomatis commented Jul 5, 2018

schomatis commented Jul 5, 2018

schomatis commented Jul 11, 2018

schomatis commented Jul 11, 2018

schomatis commented Jan 30, 2019