Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

(WIP) feat: dag import and export to and from CAR files #2953

Closed
wants to merge 4 commits into from
Closed

(WIP) feat: dag import and export to and from CAR files #2953

wants to merge 4 commits into from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Mar 26, 2020

Closes: #2745
Ref: ipfs/kubo#7011
Ref: ipfs/kubo#6870

Disclaimer: this is my first hack directly on this repo, so I've had a steep learning curve today figuring this out and I wouldn't be surprised if I have some things very wrong! Guidance gratefully accepted, or perhaps someone else would like to take ownership of this?

I thought it would be nice to widen the conversation about the new ipfs dag export and ipfs dag import commands being added in go-ipfs. This is approximate parity, minus some minor details (mostly noted in the code). No tests here, it would be good to share the same fixture suite as @ribasushi is building for go-car.

Because datastore-car uses the newer @ipld/block, that gets imported here (export uses a different Block than import). That creates a bit of space to have duplicated dependencies so that'd be worth checking on and trying to minimise.

Notes:

  • export supports only a single root CID and applies a full DAG walk to it. In the future it's expected that you'll also apply a selector to it (default would be like a *) as well and CARv2 would even store that selector with the root. It produces a "well-formed" CAR, that's deterministic in theory. We need to test the limits of that determinism but assuming we're walking the graph in the same order then it should be identical each time you run it with the same root in js-ipfs or go-ipfs.
  • import is lax, by design, it accepts a CAR as a "bundle of blocks", perhaps with no root, perhaps with many roots, perhaps with roots that don't even exist in the CAR body. It just dumps the blocks into your store. Where there's a root specified that's found in the body, it'll pin that root in your store. I suspect some details here will be refined in go-car and need to be synced here. For one: go-car currently doesn't accept zero-root CAR files but JS does.

Examples:

export:

$ ./src/cli/bin.js dag export QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm > xkcd.car

Will result in a 108M file named xkcd.car containing a Unixfs mirror of XKCD from some point in time.

import:

$ ./src/cli/bin.js dag import xkcd.car
importing from xkcd.car...
pinned QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
imported 7342 blocks

lazy simulation of multi file import:

$ ./src/cli/bin.js dag import xkcd.car xkcd.car
importing from xkcd.car...
pinned root QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
importing from xkcd.car...
pinned root QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
imported 14684 blocks

import from stdin:

$ cat xkcd.car | ./src/cli/bin.js dag import
importing from stdin...
pinned root QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
imported 7342 blocks

@ribasushi
Copy link

Regarding the xkcd archive I get the following in go-ipfs:

cmd/ipfs/ipfs dag export QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm | sha256sum 
Exported .car size:	112277914
5a49df60c4842e7fe71849410025a8a5a98e2d0e6e71f90b4913635f7df7250d  -

Would be nice to double-check we converge bit-to-bit

@rvagg
Copy link
Member Author

rvagg commented Mar 26, 2020

👍

$ shasum -a 256 xkcd.car
5a49df60c4842e7fe71849410025a8a5a98e2d0e6e71f90b4913635f7df7250d  xkcd.car
$ ls -al xkcd.car
-rw-r--r-- 1 rvagg staff 112277914 Mar 26 20:55 xkcd.car

@rvagg
Copy link
Member Author

rvagg commented Apr 9, 2020

export and import got merged into go-ipfs today: ipfs/kubo#7036 & ipfs/kubo#7038

Can someone in charge of js-ipfs tell me whether it's worth pursuing this PR further to match functionality (I think it's pretty close) and reproduce the excellent tests that Peter has included over there? There's objections in #2745 about code bloat and I don't want to waste time on something that might be rejected on that basis.

@achingbrain maybe?

@rvagg
Copy link
Member Author

rvagg commented Jun 30, 2020

I've updated datastore-car upstream to handle this, and I've imported the sharness test from go-ipfs that covers this but it seems like we're not set up to run sharness fully compatible with go-ipfs. iptb seems to be missing and I'm not sure how far down this rabbit hole I should try and go because I have zero idea how deep it might be. @achingbrain or someone else, can you background me one sharness and the compatibility between here and go-ipfs? Is it a subset here for a reason, is it just a matter of fast-forwarding some functionality to match, or is it not possible, or not reasonable to get compatibility? Maybe I need to rewrite the tests entirely in JS to avoid sharness?

Comment on lines +109 to +130
reset_blockstore 0
reset_blockstore 1

mkfifo pipe_testnet
mkfifo pipe_devnet

test_expect_success "fifo import" '
(
cat ../t0054-dag-car-import-export-data/lotus_testnet_export_128_shuffled_nulroot.car > pipe_testnet &
cat ../t0054-dag-car-import-export-data/lotus_devnet_genesis_shuffled_nulroot.car > pipe_devnet &

do_import 0 \
pipe_testnet \
pipe_devnet \
../t0054-dag-car-import-export-data/combined_naked_roots_genesis_and_128.car \
> basic_fifo_import_actual
result=$?

wait || true # work around possible trigger of a bash bug on overloaded circleci
exit "$result"
)
'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg the exercise of the GC-lock and the import of FIFOs may not be something you are too inteerested in testing within js-ipfs. Just raising it here as it was a very important part of making 🗡️ viable.

Comment on lines +57 to +58
// TODO: ^ go-car currently attempts to pin roots even if they don't exist in
// the CAR body, need to align behaviour
Copy link

@ribasushi ribasushi Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg this is done in go-ipfs to allow a copy-less "transactional" operation:

<some source> | stream-dagger <many options> --emit-stdout=car-v0-fifos-xargs | xargs -0 ipfs dag import

What that mode does is print 2 fifo names on stdout. The first fifo contains all the data. The second contains the roots only ( because we can derive the roots only once we streamed everything over ). The full dag import context serves as a "transaction" of sorts, keeping GC at bay between the lengthy data stream and the pin at the very end.

Whether js-ipfs needs to support this at the same level as go-ipfs is an open question. /cc @mikeal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i’d wait to support this until a later date if js-ipfs wants to prioritize it.

@achingbrain
Copy link
Member

Maybe I need to rewrite the tests entirely in JS to avoid sharness?

Yes please - sorry for the misdirection. The sharness tests here aren't run and should really be deleted.

The testing strategy is (I really must put this in a doc):

CLI

Tests live in /packages/ipfs/test/cli.

All interactions with IPFS core are stubbed so we just ensure that the correct arguments are passed in

HTTP API

Tests live in /packages/ipfs/test/http-api and are similar to the CLI tests in that we stub out core interactions and inject requests with shot.

Core

Anything non-implementation specific should be considered part of the 'Core APIs'. For example node setup code is not Core, but anything that does useful work, e.g. network/repo/etc interactions would be Core.

All Core APIs should be documented in /docs/core-api.

All Core APIs should have comprehensive tests in /packages/interface-ipfs-core.

interface-ipfs-core should ensure API compatibility across implementations. Tests are run:

  1. Against /packages/ipfs/src/core directly
  2. Against /packages/ipfs/src/http over HTTP via ipfs-http-client
  3. Against go-ipfs over HTTP via ipfs-http-client

Non-Core

Any non-core API functionality is tested in /packages/ipfs-http-api/tests and /packages/ipfs/tests for ipfs-http-client and ipfs respectively.

@jacobheun
Copy link
Contributor

What's the status of this? Do we intend to finish it?

@rvagg
Copy link
Member Author

rvagg commented Dec 17, 2020

We have a replacement CAR library now, https://github.com/ipld/js-car, so that needs to be integrated here. It uses the new js-multiformats though, so it'll start bringing in some new stack pieces.

Is this work needed though? I don't have a feel for whether this kind of parity with go-ipfs is even the goal these days?

@rvagg rvagg closed this Jun 28, 2021
@rvagg rvagg deleted the rvagg/import-export branch June 28, 2021 06:10
@rvagg
Copy link
Member Author

rvagg commented Jun 28, 2021

had to start this again because so much has moved on, both here and in ipld/multiformats

achingbrain added a commit that referenced this pull request Jul 27, 2021
Adds `ipfs.dag.import` and `ipfs.dag.export` commands to import/export CAR files,
e.g. single-file archives that contain blocks and root CIDs.

Supersedes #2953
Fixes #2745

Co-authored-by: achingbrain <alex@achingbrain.net>
SgtPooki referenced this pull request in ipfs/js-kubo-rpc-client Aug 18, 2022
Adds `ipfs.dag.import` and `ipfs.dag.export` commands to import/export CAR files,
e.g. single-file archives that contain blocks and root CIDs.

Supersedes #2953
Fixes #2745

Co-authored-by: achingbrain <alex@achingbrain.net>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: import/export a DAG from/to a CAR file
5 participants