switch dag put cmd to directly use prime #7995

willscott · 2021-03-21T02:32:43Z

This is the only non-plugin use of the core/coredag package.

The only thing we need to understand is that there appears to be a provision where a dag that is represented by a single block in the import codec may become multiple blocks in it's serialized format codec. The previous code was able to parse input into multiple blocks, although it needs deeper investigation as to how such boundaries are realized in practice.

willscott · 2021-03-21T03:36:36Z

notably, the failure is that coredag sets up parsers for json and cbor data, which we don't have prime codecs for (we have dag-json and dag-cbor).

hannahhoward · 2021-03-23T18:27:02Z

@willscott As best I can tell the currently accepted translations are:

"json" -> ["cbor", "dag-cbor", "protobuf", "dag-pb"]
"raw" -> ["cbor", "dag-cbor", "protobuf", "dag-pb", "raw"]
"cbor" -> ["cbor", "dag-cbor"]
"protobuf" -> ["protobuf", "dag-pb"]

However, it would be inaccurate to say that it properly supports the above referenced multicodecs as specified in the multicodec table inputs and outputs. Several of the translations are not what they might appear.

Here's what I can see:

"json" -> "cbor", "dag-cbor" uses ipldcbor.FromJSON, which actually interprets the JSON as DAG-JSON (i.e. does the "/" map conversion - https://github.com/ipfs/go-ipld-cbor/blob/f689d2bb3874cf3fafb71721cafb2c945234e781/node.go#L481)

"json" -> "protobuf", "dag-pb" uses ProtoNode.UnmarshalJSON (https://github.com/ipfs/go-merkledag/blob/8f475e5385e2f262e2e07188817966bd91a6e9f0/node.go#L279) which um... behaves pretty weird. 1) it will only properly translate JSON which fits the DagPB structure in the sense that passes this struct:

	s := struct {
		Data  []byte         `json:"data"`
		Links []*format.Link `json:"links"`
	}{}

to golangs JSON unmarshaller. I don't know exactly what that qualifies as other than just "not standard anything" - it will convert a link string to a CID but only if it happens to fit the right format in that structure. Note the field names in format.Link are NOT the same as those in the dag-pb spec, FWIW.

"cbor" -> "cbor", "dag-cbor" just uses the same ipld.Decode function -- meaning it really interprets cbor as dag-cbor.

"protobuf" -> "protobuf", "dab-pb" uses merkledag.DecodeProtobuf (https://github.com/ipfs/go-merkledag/blob/master/coding.go#L110) which will actually ONLY deserialize dag-pb -- every other protobuf will fail

Long and short, the current translation table is far from a set of standardized multicodec translatations. Even if we added proper cbor / json multicodec plugins to IPLD Prime, we would NOT end up replicating the current behavior.

I think we either need to accept a breaking change here, or we need to accept that this is not a multicodec translator and find a different way to replicate current behavior. (or deprecate these commands and make new ones)

cc: @warpfork @aschmahmann @Stebalien for further input

willscott · 2021-03-23T18:29:54Z

My current take away from my conversations on this with @aschmahmann and @Stebalien is that if are able to support the current translation functionality, and can document the new command invocations that replace current ones, it would be acceptabel to change the names of codecs so that they are specified based on their canonical multicodec names.

willscott · 2021-03-24T03:33:40Z

the remaining relevant sharness test failures include:

ipfs dag put --format=git --input-enc=zlib {} \; -exec echo \; > hashes
  
Error: zlib is not a valid codec name

expecting success: 
  dag_hash=$(ipfs dag put <<<"{\"i\": {\"j\": {\"k\": \"asdfasdfasdf\"}}}")

Error: unknown multihash code 18446744073709551615 (0xffffffffffffffff): no such hash registered
not ok 4 - resolve: prepare dag

expecting success: 
    IPLDHASH=$(cat ipld_object | ipfs dag put -f protobuf)
  
Error: no encoder registered for multicodec code 80 (0x50)
not ok 5 - can add an ipld object using protobuf
#	
#	    IPLDHASH=$(cat ipld_object | ipfs dag put -f protobuf)
#	  

expecting success: 
    EXPHASH="QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n"
    test $EXPHASH = $IPLDHASH
  
lib/sharness/sharness.sh: line 378: test: QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n: unary operator expected
not ok 6 - output looks correct
#	
#	    EXPHASH="QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n"
#	    test $EXPHASH = $IPLDHASH
#	  

expecting success: 
    IPLDHASHb32=$(cat ipld_object | ipfs dag put -f protobuf --cid-base=base32)
  
Error: no encoder registered for multicodec code 80 (0x50)
not ok 7 - can add an ipld object using protobuf and --cid=base=base32
#	
#	    IPLDHASHb32=$(cat ipld_object | ipfs dag put -f protobuf --cid-base=base32)
#	  

expecting success: 
    test $EXPHASH = $IPLDHASHb32
  
lib/sharness/sharness.sh: line 377: test: QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n: unary operator expected
not ok 8 - output looks correct (does not upgrade to CIDv1)
#	
#	    test $EXPHASH = $IPLDHASHb32
#	  

expecting success: 
    IPLDHASH=$(cat ipld_object | ipfs dag put)
  
Error: unknown multihash code 18446744073709551615 (0xffffffffffffffff): no such hash registered
not ok 9 - can add an ipld object
#	
#	    IPLDHASH=$(cat ipld_object | ipfs dag put)
#	  

expecting success: 
    EXPHASH="bafyreidjtjfmavdk7epvztob2m5vlm3pxp3gjmpyewro4qlbw5n4f4iz64"
    test $EXPHASH = $IPLDHASH
  
not ok 10 - output looks correct
#	
#	    EXPHASH="bafyreidjtjfmavdk7epvztob2m5vlm3pxp3gjmpyewro4qlbw5n4f4iz64"
#	    test $EXPHASH = $IPLDHASH
#	  
lib/sharness/sharness.sh: line 378: test: bafyreidjtjfmavdk7epvztob2m5vlm3pxp3gjmpyewro4qlbw5n4f4iz64: unary operator expected

expecting success: 
    IPLDHASHb32=$(cat ipld_object | ipfs dag put --cid-base=base32)
  
Error: unknown multihash code 18446744073709551615 (0xffffffffffffffff): no such hash registered
not ok 11 - can add an ipld object using --cid-base=base32
#	
#	    IPLDHASHb32=$(cat ipld_object | ipfs dag put --cid-base=base32)
#	  

expecting success: 
    test $(ipfs cid base32 $EXPHASH) = $IPLDHASHb32
  
lib/sharness/sharness.sh: line 377: test: bafyreidjtjfmavdk7epvztob2m5vlm3pxp3gjmpyewro4qlbw5n4f4iz64: unary operator expected
not ok 12 - output looks correct


expecting success: 
    HASH=$(cat ../t0053-dag-data/non-canon.cbor | ipfs dag put --format=cbor --input-enc=raw) &&
    test $HASH = "bafyreiawx7ona7oa2ptcoh6vwq4q6bmd7x2ibtkykld327bgb7t73ayrqm" ||
    test_fsh echo $HASH
  
Error: unknown multihash code 18446744073709551615 (0xffffffffffffffff): no such hash registered
> echo


not ok 27 - non-canonical cbor input is normalized

expecting success: 
    HASH=$(cat ../t0053-dag-data/non-canon.cbor | ipfs dag put --format=cbor --input-enc=cbor) &&
    test $HASH = "bafyreiawx7ona7oa2ptcoh6vwq4q6bmd7x2ibtkykld327bgb7t73ayrqm" ||
    test_fsh echo $HASH
  
Error: unhandled cbor tag 42
> echo


not ok 28 - non-canonical cbor input is normalized with input-enc cbor


expecting success: 
    ipfs dag get $HASH > pbjson &&
    cat pbjson | ipfs dag put --format=dag-pb --input-enc=json > dag_put_out
  
Error: invalid key for map dagpb.PBNode: "data": no such field
not ok 34 - dag put with json dag-pb works

expecting success: 
    ipfs block get $HASH > pbraw &&
    cat pbraw | ipfs dag put --format=dag-pb --input-enc=raw > dag_put_out
  
Error: func called on wrong kind: AssignNode called on a dagpb.PBNode node (kind: bytes), but only makes sense on map
not ok 36 - dag put with raw dag-pb works

Stebalien · 2021-03-24T22:45:19Z

Please feel comfortable breaking these conversions. We'll announce the breakages in the release notes and, honestly, I can't think of a single user who won't simply be happy that the API has improved.

rvagg · 2021-07-15T01:48:24Z

With ipld/go-ipld-prime#202 and ipld/go-ipld-prime#203 we mostly get down to DAG-PB form differences, which will mean fixing up the test fixtures. But, there's also newlines, which the current dag get prints thanks to (I think) the difference between what cmds.EmitOnce(res, &out) and res.Emit(r) are doing. This fixes some of the test cases that are requiring a newline, but are we happy with this? Do we want the user to be able to get a "pure" output, with only newline characters where they are properly part of the data, or would we expect them to strip newlines if they are redirecting to a file or piping to some other command that might be requiring pure output.

diff --git a/core/commands/dag/get.go b/core/commands/dag/get.go
index cef5605e1..79e355bed 100644
--- a/core/commands/dag/get.go
+++ b/core/commands/dag/get.go
@@ -70,6 +70,7 @@ func dagGet(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment) e
                if err := encoder(finalNode, w); err != nil {
                        _ = res.CloseWithError(err)
                }
+               w.Write([]byte{'\n'})
        }()

        return res.Emit(r)

I think maybe this isn't appropriate and we should fix the tests to not have newlines. If I can now dag get a DAG-CBOR object then I should be able to redirect it to a file and not have to deal with a stray newline in there when trying to decode it, same with piping to another command, like a CBOR diagnostic printer.

Thoughts?

willscott · 2021-07-15T02:32:30Z

I noticed that one as well. I think I'm willing to say that not adding the extra newline here is the right behavior.

rvagg · 2021-07-15T03:09:09Z

+1 to not adding, I'm removing newlines from fixtures.

Next question - is this migration also a migration to CIDv1? Some fixtures depend on output from ipfs add to compare to CIDs from ipfs dag put, so we have a v0 v1 mismatch. But making add spit out v1 CIDs causes lots of other fixtures to fail because of the default to v0 everywhere. I assume we're trying to be minimal here and not go whole-hog on migration? Do we just fix up fixtures & sharness to do the conversion for us?

willscott · 2021-07-15T04:11:45Z

i don't think we tackled that can of worms in the ipld-in-ipfs work. there's other datastore work that needs to happen as well to switch entirely to cidv1

masih · 2021-07-22T13:30:34Z

I can confirm this PR fixes the new-line issue captured in #3503
Please note a sharness test was added via #8280 to capture the known breakage, which needs to be flipped to test_expect_success here once this PR is rebased from master.

Nice one 🍻

rvagg · 2021-07-26T08:26:31Z

IMO this is good to merge, sharness tests are passing. The go-ipfs-http-client error is ipfs/interface-go-ipfs-core#75 but that's also failing in the branch we're merging in to @ #7976 which doesn't have circleci tests run because it's a draft.

Updated Update go-ipld-git to a go-ipld-prime codec go-ipld-git#46 to include a fix from ipld-prime codegen (fix: typed links LinkTargetNodePrototype should return ReferencedType ipld/go-ipld-prime#211)
Also depends on a branch (of a branch) of go-path to fix a minor resolution problem there: fix(resolver): LookupBySegment to handle list indexes as well as map fields go-path#42

I think we should merge this branch and continue work on #7976.

willscott · 2021-07-26T14:07:53Z

@aschmahmann do you want to read through this user-facing change-set in isolation before we merge it into the target merge branch?

aschmahmann · 2021-07-26T14:30:53Z

Sure, that makes sense. Do the test modifications cover all the user facing changes?

rvagg · 2021-07-26T23:15:54Z

@aschmahmann yes, the changes reflected in the tests can be summed up roughly as:

output now omits the newline character so we get exactly the right bytes for whatever codec you use
proper dag-json output, as well as proper dag-pb forms mean some of the JSON outputs are different now
t0053-dag.sh has some changes which I think I'd like to revert when Sort map entries marshalling dag-cbor ipld/go-ipld-prime#204 is merged and we get proper dag-cbor sorting, but it also has a bunch of additions that exercise different parts of the dag encode/decode flow and the main codecs
as the git plugin test demonstrates, there's now no auto-detection of hash function for codec so it has to manually have sha1 specified for a git import.

rvagg · 2021-07-27T03:32:53Z

Updated ipld-prime with dag-cbor sorting, reverted some sharness fixtures back to orig so there's even less diff now, will comment inline on a few things.

rvagg · 2021-07-27T03:34:24Z

test/sharness/t0053-dag.sh

    test_cmp cat_exp cat_out
  '

-  test_expect_success "non-canonical cbor input is normalized" '
-    HASH=$(cat ../t0053-dag-data/non-canon.cbor | ipfs dag put --format=cbor --input-enc=raw) &&


I guess "cbor" used to mean dag-cbor? I'm not sure how "raw" was working here though since non-canon.cbor is not raw, but the CID is correct for dag-cbor. (btw I've verified many of the CIDs against the JS implementations of our codecs too so we have agreement).

So the incoming diff replacing these lines, which says dag-cbor in name, --format, and --input-enc, that now seems correct, right? Agree the outgoing diff was confusing and quite likely what I'd call wrong.

rvagg · 2021-07-27T03:47:32Z

test/sharness/t0053-dag.sh

-  test_expect_success "output looks correct" '
-    EXPHASH="QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n"
+  test_expect_success "CID looks correct" '
+    EXPHASH="bafyreiblwimnjbqcdoeafiobk6q27jcw64ew7n2fmmhdpldd63edmjecde"


this is not the CIDv1 form of QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n. The input object on line 25 contains "beep":[0,"bop"] and current ipfs is encoding this 0 as a float64(0) (fb0000000000000000) whereas the new dag-cbor is doing the proper thing of smallest-possible int (00). I don't know why it's interpreting an int as a float, I might look into that because it's a bit odd, but at least now it's doing the right thing and it matches the JS impl.

rvagg · 2021-07-27T03:48:16Z

test/sharness/t0053-dag.sh

  '

  test_expect_success "output looks correct" '
-    EXPHASH="bafyriqgae54zjl3bjebmbat2rjem4ewj6vni6jxohandmvk3bibfgv3sioyeidppsghvulryxats43br3b7afa6jy77x6gqzqaicer6ljicck"


this CID is different for the reason above re float64(0) vs int(0) encoding. I've verified the new one against the JS codec & hasher's output and it's correct

rvagg · 2021-07-27T03:55:04Z

test/sharness/t0053-dag.sh

-    echo "Size: 15, NumBlocks: 1" > exp_stat_inner_ipld_obj &&
+    echo "Size: 8, NumBlocks: 1" > exp_stat_inner_ipld_obj &&
    test_cmp exp_stat_inner_ipld_obj actual_stat_inner_ipld_obj &&
    ipfs dag stat $HASH > actual_stat_ipld_obj &&
-    echo "Size: 61, NumBlocks: 2" > exp_stat_ipld_obj &&
+    echo "Size: 54, NumBlocks: 2" > exp_stat_ipld_obj &&


iirc these differences come entirely from {"data":123} being encoded properly now, a16464617461187b:

a1 # map(1) 64 # string(4) 64617461 # "data" 18 7b # uint(123)

whereas the current impl is doing the float thing again: a16464617461fb405ec00000000000:

a1 # map(1) 64 # string(4) 64617461 # "data" fb 405ec00000000000 # float(123)

so we're losing 7 bytes on that block, then the same number on the cumulative count for the block that links to it

willscott · 2021-07-30T05:10:40Z

@rvagg not seeing anything further raised here for 3 days I think we're okay to merge this PR into the main target merge branch

rvagg · 2021-07-30T12:06:39Z

agreed, @hannahhoward and @aschmahmann we'll merge this and continue working on the feat/ipld-in-ipfs branch via #7976.

willscott requested a review from hannahhoward March 21, 2021 02:32

willscott mentioned this pull request Apr 4, 2021

IPLD Prime In IPFS: Target Merge Branch #7976

Merged

48 tasks

willscott force-pushed the feat/dag-put-ipld-prime branch from 3daa431 to 0c5fb17 Compare April 6, 2021 19:50

willscott mentioned this pull request May 24, 2021

feat: serve CBOR encoded DAG nodes from the gateway #8037

Closed

willscott force-pushed the feat/dag-put-ipld-prime branch 2 times, most recently from 49c78d7 to cdde999 Compare July 12, 2021 23:57

masih mentioned this pull request Jul 22, 2021

Extra newline at end of "dag get" result #3503

Closed

willscott and others added 13 commits July 23, 2021 20:40

switch dag put cmd to directly use prime

e4d03a6

enable codecs

f95219e

point api tests to ipld-in-ipfs branch

92fce13

partial update to sharness. update git plugin

ae20ee7

fix lint

1a58d05

correct codec name

474fdcf

add reasonable default mhType.

d84bb28

fix input encoding

71ea0d6

update go-ipld-git to encode from basic nodes

37a128b

fix some sharness test

ac1490d

nits

040c8f3

pb sharness update

0d0f43c

try with capitalized fields

4b3013a

willscott and others added 11 commits July 23, 2021 20:44

bump ipld-prime

319e9c3

bump go-multicodec and use Code.Set

215ec2c

fix dag test fixtures for go-ipld-prime codecs

3e8f569

Temporarily use rvagg/dagjsonsort branch of go-ipld-prime

fb25f6f

Use current go-ipld-prime master with DAG-JSON sorting

1afd8e8

Use mh.Set() for "hash" -> mhType

65c8589

Make git plugin test work again

5dd6196

Default "hash" to sha2-256

6285562

Use hex value in test to match code

705914a

Update to latest go-ipld-git, reset sharness test paths to original

963d53b

Use mc.ReservedStart from new go-multiformats

6522df1

rvagg force-pushed the feat/dag-put-ipld-prime branch from 42155c4 to 6522df1 Compare July 23, 2021 10:48

Update deps

fc9062b

rvagg mentioned this pull request Jul 26, 2021

Update error message for go-ipld-prime path resolution ipfs/interface-go-ipfs-core#75

Closed

Node dag-cbor sorting TODO

4f0c15f

Update go-ipld-prime, revert tests to sorted-cbor forms

c3b6d26

rvagg reviewed Jul 27, 2021

View reviewed changes

willscott mentioned this pull request Jul 27, 2021

Support encoing using IPLD-Prime codecs ipfs/go-ipfs-cmds#209

Closed

rvagg merged commit 70eed4e into feat/ipld-in-ipfs Jul 30, 2021

rvagg deleted the feat/dag-put-ipld-prime branch July 30, 2021 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch dag put cmd to directly use prime #7995

switch dag put cmd to directly use prime #7995

willscott commented Mar 21, 2021

willscott commented Mar 21, 2021

hannahhoward commented Mar 23, 2021 •

edited

Loading

willscott commented Mar 23, 2021

willscott commented Mar 24, 2021 •

edited

Loading

Stebalien commented Mar 24, 2021

rvagg commented Jul 15, 2021

willscott commented Jul 15, 2021

rvagg commented Jul 15, 2021

willscott commented Jul 15, 2021

masih commented Jul 22, 2021 •

edited

Loading

rvagg commented Jul 26, 2021

willscott commented Jul 26, 2021

aschmahmann commented Jul 26, 2021

rvagg commented Jul 26, 2021

rvagg commented Jul 27, 2021

rvagg Jul 27, 2021

warpfork Jul 29, 2021

rvagg Jul 27, 2021

rvagg Jul 27, 2021

rvagg Jul 27, 2021 •

edited

Loading

willscott commented Jul 30, 2021

rvagg commented Jul 30, 2021

switch dag put cmd to directly use prime #7995

switch dag put cmd to directly use prime #7995

Conversation

willscott commented Mar 21, 2021

willscott commented Mar 21, 2021

hannahhoward commented Mar 23, 2021 • edited Loading

willscott commented Mar 23, 2021

willscott commented Mar 24, 2021 • edited Loading

Stebalien commented Mar 24, 2021

rvagg commented Jul 15, 2021

willscott commented Jul 15, 2021

rvagg commented Jul 15, 2021

willscott commented Jul 15, 2021

masih commented Jul 22, 2021 • edited Loading

rvagg commented Jul 26, 2021

willscott commented Jul 26, 2021

aschmahmann commented Jul 26, 2021

rvagg commented Jul 26, 2021

rvagg commented Jul 27, 2021

rvagg Jul 27, 2021

Choose a reason for hiding this comment

warpfork Jul 29, 2021

Choose a reason for hiding this comment

rvagg Jul 27, 2021

Choose a reason for hiding this comment

rvagg Jul 27, 2021

Choose a reason for hiding this comment

rvagg Jul 27, 2021 • edited Loading

Choose a reason for hiding this comment

willscott commented Jul 30, 2021

rvagg commented Jul 30, 2021

hannahhoward commented Mar 23, 2021 •

edited

Loading

willscott commented Mar 24, 2021 •

edited

Loading

masih commented Jul 22, 2021 •

edited

Loading

rvagg Jul 27, 2021 •

edited

Loading