Skip to content
This repository has been archived by the owner on Aug 12, 2020. It is now read-only.

feat: support --raw-leaves #219

Merged
merged 1 commit into from
Jul 19, 2018
Merged

feat: support --raw-leaves #219

merged 1 commit into from
Jul 19, 2018

Conversation

achingbrain
Copy link
Collaborator

@achingbrain achingbrain commented Jul 17, 2018

Goes some way towards fixing ipfs/js-ipfs#1432 - will need follow up PRs for js-ipfs-mfs and js-ipfs itself (🔜).

There are three ways of importing a file we need to support and each will end up with slightly different DAG structure.

  1. ipfs add will result in a balanced DAG with leaf nodes that are unixfs nodes of type file
  2. ipfs files write results in a trickle DAG with leaf nodes that are unixfs nodes of type raw
  3. ipfs add --raw-leaves and ipfs files write --raw-leaves have the balanced/trickle DAG of above, but the leaf nodes are chunks of file data not wrapped in protobufs.

In all cases above the root node is a unixfs file node with a v0 CID, unless you specify --cid-version=1.

This PR:

  • Changes the meaning of existing rawLeaves argument. It now means the leaf node is just data - a chunk of the file - this is consistent with go. Previously it was meant a unixfs node with type raw. So far the only code using this is js-ipfs-mfs so changing it shouldn't be too disruptive.
  • Adds a leafType option which can be file or raw - when --raw-leaves is false, this is what the unixfs leaf type will be.
  • Uses CIDv1 for raw leaves with the codec raw

@parkan
Copy link

parkan commented Jul 17, 2018

Great, this makes raw-leaves behavior more consistent between js and go, as well 😄

Copy link
Contributor

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good so far - I've not had time to run tests and I'd like to do another pass over the code when I get a moment but leaving my feedback anyway 😛

@@ -149,7 +149,7 @@ The input's file paths and directory structure will be preserved in the [`dag-pb
- `onlyHash` (boolean, defaults to false): Only chunk and hash - do not write to disk
- `hashAlg` (string): multihash hashing algorithm to use
- `cidVersion` (integer, default 0): the CID version to use when storing the data (storage keys are based on the CID, _including_ it's version)
- `rawLeafNodes` (boolean, defaults to false): When a file would span multiple DAGNodes, if this is true the leaf nodes will be marked as `raw` `unixfs` nodes
- `rawLeaves` (boolean, defaults to false): When a file would span multiple DAGNodes, if this is true the leaf nodes will not be wrapped in `UnixFS` protobufs and will instead contain the raw file bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document leafType here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

const file = new UnixFS(options.leafType, buffer)

DAGNode.create(file.marshal(), [], options.hashAlg, (err, node) => {
let cid = new CID(0, 'dag-pb', node.multihash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle err before this? node would be null right?


if (options.cidVersion === 1) {
cid = cid.toV1()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor suggestion - you can probably short circuit this and just pass options.cidVersion || 0 to the constructor above. toV1 will create a new instance and validate it.

Same same for reduce.js

@achingbrain achingbrain force-pushed the support-raw-leaf-nodes branch from 438f3b4 to 9105db1 Compare July 17, 2018 16:23
Copy link
Contributor

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to approve after the one requested change. This LGTM and tests all pass.

}
], (err, node) => {
], (err, {node, cid}) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't destructure like this in a callback.

If err, then the result will be undefined/null and will try to destructure the properties anyway and throw.

Goes some way towards fixing ipfs/js-ipfs#1432 - will need follow up PRs for js-ipfs-mfs and js-ipfs itself (🔜).

There are three ways of importing a file we need to support and each will end up with slightly different DAG structure.

ipfs add will result in a balanced DAG with leaf nodes that are unixfs nodes of type file
ipfs files write results in a trickle DAG with leaf nodes that are unixfs nodes of type raw
ipfs add --raw-leaves and ipfs files write --raw-leaves have the balanced/trickle DAG of above, but the leaf nodes are chunks of file data not wrapped in protobufs.
In all cases above the root node is a unixfs file node with a v0 CID, unless you specify --cid-version=1.

This PR:

Changes meaning of existing rawLeaves argument. Now means the leaf node is just data - a chunk of the file, previously it was meant a unixfs node with type raw. So far the only code using this is js-ipfs-mfs so changing it shouldn't be too disruptive.
Adds a leafType option which can be file or raw - when --raw-leaves is false, this is what the unixfs leaf type will be.
Uses CIDv1 for raw leaves with the codec raw
@achingbrain achingbrain force-pushed the support-raw-leaf-nodes branch from 9105db1 to 7a29d83 Compare July 18, 2018 16:39
Copy link
Contributor

@daviddias daviddias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @achingbrain ❤️! Didn't see anything alarming but I must confess that this module is starting to get a bit unwieldy.

It seems it is time to break the multiple importers into their own packages (and document them nicely).

DAGNode.create(fileNode.marshal(), [], options.hashAlg, (err, node) => {
cb(err, { DAGNode: node, fileNode: fileNode })
DAGNode.create(fileNode.marshal(), [], options.hashAlg, (error, node) => {
cb(error, { DAGNode: node, fileNode: fileNode })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, why s/err/error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference really - I think short variable names make the code less expressive.

@achingbrain
Copy link
Collaborator Author

this module is starting to get a bit unwieldy

Agreed, it's long overdue a refactor - there's a lot of conditional logic that could be eliminated.

It seems it is time to break the multiple importers into their own packages (and document them nicely).

Possibly - if this module was decomposed into lots of smaller modules, could you see them being consumed independently?

It would certainly introduce a bunch of administrative overhead - maybe we could trial it as a monorepo to alleviate some of that burden?

@achingbrain achingbrain merged commit 8d0b025 into master Jul 19, 2018
@ghost ghost removed the status/in-progress In progress label Jul 19, 2018
@achingbrain achingbrain deleted the support-raw-leaf-nodes branch July 19, 2018 09:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants