Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch dag put cmd to directly use prime #7995

Merged
merged 32 commits into from
Jul 30, 2021

Conversation

willscott
Copy link
Contributor

This is the only non-plugin use of the core/coredag package.

The only thing we need to understand is that there appears to be a provision where a dag that is represented by a single block in the import codec may become multiple blocks in it's serialized format codec. The previous code was able to parse input into multiple blocks, although it needs deeper investigation as to how such boundaries are realized in practice.

@willscott
Copy link
Contributor Author

notably, the failure is that coredag sets up parsers for json and cbor data, which we don't have prime codecs for (we have dag-json and dag-cbor).

@hannahhoward
Copy link
Contributor

hannahhoward commented Mar 23, 2021

@willscott As best I can tell the currently accepted translations are:

  • "json" -> ["cbor", "dag-cbor", "protobuf", "dag-pb"]
  • "raw" -> ["cbor", "dag-cbor", "protobuf", "dag-pb", "raw"]
  • "cbor" -> ["cbor", "dag-cbor"]
  • "protobuf" -> ["protobuf", "dag-pb"]

However, it would be inaccurate to say that it properly supports the above referenced multicodecs as specified in the multicodec table inputs and outputs. Several of the translations are not what they might appear.

Here's what I can see:


"json" -> "cbor", "dag-cbor" uses ipldcbor.FromJSON, which actually interprets the JSON as DAG-JSON (i.e. does the "/" map conversion - https://github.com/ipfs/go-ipld-cbor/blob/f689d2bb3874cf3fafb71721cafb2c945234e781/node.go#L481)


"json" -> "protobuf", "dag-pb" uses ProtoNode.UnmarshalJSON (https://github.com/ipfs/go-merkledag/blob/8f475e5385e2f262e2e07188817966bd91a6e9f0/node.go#L279) which um... behaves pretty weird. 1) it will only properly translate JSON which fits the DagPB structure in the sense that passes this struct:

	s := struct {
		Data  []byte         `json:"data"`
		Links []*format.Link `json:"links"`
	}{}

to golangs JSON unmarshaller. I don't know exactly what that qualifies as other than just "not standard anything" - it will convert a link string to a CID but only if it happens to fit the right format in that structure. Note the field names in format.Link are NOT the same as those in the dag-pb spec, FWIW.


"cbor" -> "cbor", "dag-cbor" just uses the same ipld.Decode function -- meaning it really interprets cbor as dag-cbor.


"protobuf" -> "protobuf", "dab-pb" uses merkledag.DecodeProtobuf (https://github.com/ipfs/go-merkledag/blob/master/coding.go#L110) which will actually ONLY deserialize dag-pb -- every other protobuf will fail


Long and short, the current translation table is far from a set of standardized multicodec translatations. Even if we added proper cbor / json multicodec plugins to IPLD Prime, we would NOT end up replicating the current behavior.

I think we either need to accept a breaking change here, or we need to accept that this is not a multicodec translator and find a different way to replicate current behavior. (or deprecate these commands and make new ones)

cc: @warpfork @aschmahmann @Stebalien for further input

@willscott
Copy link
Contributor Author

My current take away from my conversations on this with @aschmahmann and @Stebalien is that if are able to support the current translation functionality, and can document the new command invocations that replace current ones, it would be acceptabel to change the names of codecs so that they are specified based on their canonical multicodec names.

@willscott
Copy link
Contributor Author

willscott commented Mar 24, 2021

the remaining relevant sharness test failures include:

ipfs dag put --format=git --input-enc=zlib {} \; -exec echo \; > hashes
  
Error: zlib is not a valid codec name

expecting success: 
  dag_hash=$(ipfs dag put <<<"{\"i\": {\"j\": {\"k\": \"asdfasdfasdf\"}}}")

Error: unknown multihash code 18446744073709551615 (0xffffffffffffffff): no such hash registered
not ok 4 - resolve: prepare dag

expecting success: 
    IPLDHASH=$(cat ipld_object | ipfs dag put -f protobuf)
  
Error: no encoder registered for multicodec code 80 (0x50)
not ok 5 - can add an ipld object using protobuf
#	
#	    IPLDHASH=$(cat ipld_object | ipfs dag put -f protobuf)
#	  

expecting success: 
    EXPHASH="QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n"
    test $EXPHASH = $IPLDHASH
  
lib/sharness/sharness.sh: line 378: test: QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n: unary operator expected
not ok 6 - output looks correct
#	
#	    EXPHASH="QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n"
#	    test $EXPHASH = $IPLDHASH
#	  

expecting success: 
    IPLDHASHb32=$(cat ipld_object | ipfs dag put -f protobuf --cid-base=base32)
  
Error: no encoder registered for multicodec code 80 (0x50)
not ok 7 - can add an ipld object using protobuf and --cid=base=base32
#	
#	    IPLDHASHb32=$(cat ipld_object | ipfs dag put -f protobuf --cid-base=base32)
#	  

expecting success: 
    test $EXPHASH = $IPLDHASHb32
  
lib/sharness/sharness.sh: line 377: test: QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n: unary operator expected
not ok 8 - output looks correct (does not upgrade to CIDv1)
#	
#	    test $EXPHASH = $IPLDHASHb32
#	  

expecting success: 
    IPLDHASH=$(cat ipld_object | ipfs dag put)
  
Error: unknown multihash code 18446744073709551615 (0xffffffffffffffff): no such hash registered
not ok 9 - can add an ipld object
#	
#	    IPLDHASH=$(cat ipld_object | ipfs dag put)
#	  

expecting success: 
    EXPHASH="bafyreidjtjfmavdk7epvztob2m5vlm3pxp3gjmpyewro4qlbw5n4f4iz64"
    test $EXPHASH = $IPLDHASH
  
not ok 10 - output looks correct
#	
#	    EXPHASH="bafyreidjtjfmavdk7epvztob2m5vlm3pxp3gjmpyewro4qlbw5n4f4iz64"
#	    test $EXPHASH = $IPLDHASH
#	  
lib/sharness/sharness.sh: line 378: test: bafyreidjtjfmavdk7epvztob2m5vlm3pxp3gjmpyewro4qlbw5n4f4iz64: unary operator expected

expecting success: 
    IPLDHASHb32=$(cat ipld_object | ipfs dag put --cid-base=base32)
  
Error: unknown multihash code 18446744073709551615 (0xffffffffffffffff): no such hash registered
not ok 11 - can add an ipld object using --cid-base=base32
#	
#	    IPLDHASHb32=$(cat ipld_object | ipfs dag put --cid-base=base32)
#	  

expecting success: 
    test $(ipfs cid base32 $EXPHASH) = $IPLDHASHb32
  
lib/sharness/sharness.sh: line 377: test: bafyreidjtjfmavdk7epvztob2m5vlm3pxp3gjmpyewro4qlbw5n4f4iz64: unary operator expected
not ok 12 - output looks correct


expecting success: 
    HASH=$(cat ../t0053-dag-data/non-canon.cbor | ipfs dag put --format=cbor --input-enc=raw) &&
    test $HASH = "bafyreiawx7ona7oa2ptcoh6vwq4q6bmd7x2ibtkykld327bgb7t73ayrqm" ||
    test_fsh echo $HASH
  
Error: unknown multihash code 18446744073709551615 (0xffffffffffffffff): no such hash registered
> echo


not ok 27 - non-canonical cbor input is normalized

expecting success: 
    HASH=$(cat ../t0053-dag-data/non-canon.cbor | ipfs dag put --format=cbor --input-enc=cbor) &&
    test $HASH = "bafyreiawx7ona7oa2ptcoh6vwq4q6bmd7x2ibtkykld327bgb7t73ayrqm" ||
    test_fsh echo $HASH
  
Error: unhandled cbor tag 42
> echo


not ok 28 - non-canonical cbor input is normalized with input-enc cbor


expecting success: 
    ipfs dag get $HASH > pbjson &&
    cat pbjson | ipfs dag put --format=dag-pb --input-enc=json > dag_put_out
  
Error: invalid key for map dagpb.PBNode: "data": no such field
not ok 34 - dag put with json dag-pb works

expecting success: 
    ipfs block get $HASH > pbraw &&
    cat pbraw | ipfs dag put --format=dag-pb --input-enc=raw > dag_put_out
  
Error: func called on wrong kind: AssignNode called on a dagpb.PBNode node (kind: bytes), but only makes sense on map
not ok 36 - dag put with raw dag-pb works


@Stebalien
Copy link
Member

Please feel comfortable breaking these conversions. We'll announce the breakages in the release notes and, honestly, I can't think of a single user who won't simply be happy that the API has improved.

@rvagg
Copy link
Member

rvagg commented Jul 15, 2021

With ipld/go-ipld-prime#202 and ipld/go-ipld-prime#203 we mostly get down to DAG-PB form differences, which will mean fixing up the test fixtures. But, there's also newlines, which the current dag get prints thanks to (I think) the difference between what cmds.EmitOnce(res, &out) and res.Emit(r) are doing. This fixes some of the test cases that are requiring a newline, but are we happy with this? Do we want the user to be able to get a "pure" output, with only newline characters where they are properly part of the data, or would we expect them to strip newlines if they are redirecting to a file or piping to some other command that might be requiring pure output.

diff --git a/core/commands/dag/get.go b/core/commands/dag/get.go
index cef5605e1..79e355bed 100644
--- a/core/commands/dag/get.go
+++ b/core/commands/dag/get.go
@@ -70,6 +70,7 @@ func dagGet(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment) e
                if err := encoder(finalNode, w); err != nil {
                        _ = res.CloseWithError(err)
                }
+               w.Write([]byte{'\n'})
        }()

        return res.Emit(r)

I think maybe this isn't appropriate and we should fix the tests to not have newlines. If I can now dag get a DAG-CBOR object then I should be able to redirect it to a file and not have to deal with a stray newline in there when trying to decode it, same with piping to another command, like a CBOR diagnostic printer.

Thoughts?

@willscott
Copy link
Contributor Author

I noticed that one as well. I think I'm willing to say that not adding the extra newline here is the right behavior.

@rvagg
Copy link
Member

rvagg commented Jul 15, 2021

+1 to not adding, I'm removing newlines from fixtures.

Next question - is this migration also a migration to CIDv1? Some fixtures depend on output from ipfs add to compare to CIDs from ipfs dag put, so we have a v0 v1 mismatch. But making add spit out v1 CIDs causes lots of other fixtures to fail because of the default to v0 everywhere. I assume we're trying to be minimal here and not go whole-hog on migration? Do we just fix up fixtures & sharness to do the conversion for us?

@willscott
Copy link
Contributor Author

i don't think we tackled that can of worms in the ipld-in-ipfs work. there's other datastore work that needs to happen as well to switch entirely to cidv1

@masih
Copy link
Member

masih commented Jul 22, 2021

I can confirm this PR fixes the new-line issue captured in #3503
Please note a sharness test was added via #8280 to capture the known breakage, which needs to be flipped to test_expect_success here once this PR is rebased from master.

Nice one 🍻

@rvagg
Copy link
Member

rvagg commented Jul 26, 2021

IMO this is good to merge, sharness tests are passing. The go-ipfs-http-client error is ipfs/interface-go-ipfs-core#75 but that's also failing in the branch we're merging in to @ #7976 which doesn't have circleci tests run because it's a draft.

I think we should merge this branch and continue work on #7976.

@willscott
Copy link
Contributor Author

@aschmahmann do you want to read through this user-facing change-set in isolation before we merge it into the target merge branch?

@aschmahmann
Copy link
Contributor

Sure, that makes sense. Do the test modifications cover all the user facing changes?

@rvagg
Copy link
Member

rvagg commented Jul 26, 2021

@aschmahmann yes, the changes reflected in the tests can be summed up roughly as:

  • output now omits the newline character so we get exactly the right bytes for whatever codec you use
  • proper dag-json output, as well as proper dag-pb forms mean some of the JSON outputs are different now
  • t0053-dag.sh has some changes which I think I'd like to revert when Sort map entries marshalling dag-cbor ipld/go-ipld-prime#204 is merged and we get proper dag-cbor sorting, but it also has a bunch of additions that exercise different parts of the dag encode/decode flow and the main codecs
  • as the git plugin test demonstrates, there's now no auto-detection of hash function for codec so it has to manually have sha1 specified for a git import.

@rvagg
Copy link
Member

rvagg commented Jul 27, 2021

Updated ipld-prime with dag-cbor sorting, reverted some sharness fixtures back to orig so there's even less diff now, will comment inline on a few things.

test_cmp cat_exp cat_out
'

test_expect_success "non-canonical cbor input is normalized" '
HASH=$(cat ../t0053-dag-data/non-canon.cbor | ipfs dag put --format=cbor --input-enc=raw) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess "cbor" used to mean dag-cbor? I'm not sure how "raw" was working here though since non-canon.cbor is not raw, but the CID is correct for dag-cbor. (btw I've verified many of the CIDs against the JS implementations of our codecs too so we have agreement).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the incoming diff replacing these lines, which says dag-cbor in name, --format, and --input-enc, that now seems correct, right? Agree the outgoing diff was confusing and quite likely what I'd call wrong.

test_expect_success "output looks correct" '
EXPHASH="QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n"
test_expect_success "CID looks correct" '
EXPHASH="bafyreiblwimnjbqcdoeafiobk6q27jcw64ew7n2fmmhdpldd63edmjecde"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the CIDv1 form of QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n. The input object on line 25 contains "beep":[0,"bop"] and current ipfs is encoding this 0 as a float64(0) (fb0000000000000000) whereas the new dag-cbor is doing the proper thing of smallest-possible int (00). I don't know why it's interpreting an int as a float, I might look into that because it's a bit odd, but at least now it's doing the right thing and it matches the JS impl.

'

test_expect_success "output looks correct" '
EXPHASH="bafyriqgae54zjl3bjebmbat2rjem4ewj6vni6jxohandmvk3bibfgv3sioyeidppsghvulryxats43br3b7afa6jy77x6gqzqaicer6ljicck"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this CID is different for the reason above re float64(0) vs int(0) encoding. I've verified the new one against the JS codec & hasher's output and it's correct

Comment on lines -274 to +322
echo "Size: 15, NumBlocks: 1" > exp_stat_inner_ipld_obj &&
echo "Size: 8, NumBlocks: 1" > exp_stat_inner_ipld_obj &&
test_cmp exp_stat_inner_ipld_obj actual_stat_inner_ipld_obj &&
ipfs dag stat $HASH > actual_stat_ipld_obj &&
echo "Size: 61, NumBlocks: 2" > exp_stat_ipld_obj &&
echo "Size: 54, NumBlocks: 2" > exp_stat_ipld_obj &&
Copy link
Member

@rvagg rvagg Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc these differences come entirely from {"data":123} being encoded properly now, a16464617461187b:

a1                                                # map(1)
  64                                              #   string(4)
    64617461                                      #     "data"
  18 7b                                           #   uint(123)

whereas the current impl is doing the float thing again: a16464617461fb405ec00000000000:

a1                                                # map(1)
  64                                              #   string(4)
    64617461                                      #     "data"
  fb 405ec00000000000                             #   float(123)

so we're losing 7 bytes on that block, then the same number on the cumulative count for the block that links to it

@willscott
Copy link
Contributor Author

@rvagg not seeing anything further raised here for 3 days I think we're okay to merge this PR into the main target merge branch

@rvagg
Copy link
Member

rvagg commented Jul 30, 2021

agreed, @hannahhoward and @aschmahmann we'll merge this and continue working on the feat/ipld-in-ipfs branch via #7976.

@rvagg rvagg merged commit 70eed4e into feat/ipld-in-ipfs Jul 30, 2021
@rvagg rvagg deleted the feat/dag-put-ipld-prime branch July 30, 2021 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants