-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch dag put cmd to directly use prime #7995
Conversation
notably, the failure is that coredag sets up parsers for |
@willscott As best I can tell the currently accepted translations are:
However, it would be inaccurate to say that it properly supports the above referenced multicodecs as specified in the multicodec table inputs and outputs. Several of the translations are not what they might appear. Here's what I can see: "json" -> "cbor", "dag-cbor" uses ipldcbor.FromJSON, which actually interprets the JSON as DAG-JSON (i.e. does the "/" map conversion - https://github.com/ipfs/go-ipld-cbor/blob/f689d2bb3874cf3fafb71721cafb2c945234e781/node.go#L481) "json" -> "protobuf", "dag-pb" uses ProtoNode.UnmarshalJSON (https://github.com/ipfs/go-merkledag/blob/8f475e5385e2f262e2e07188817966bd91a6e9f0/node.go#L279) which um... behaves pretty weird. 1) it will only properly translate JSON which fits the DagPB structure in the sense that passes this struct: s := struct {
Data []byte `json:"data"`
Links []*format.Link `json:"links"`
}{} to golangs JSON unmarshaller. I don't know exactly what that qualifies as other than just "not standard anything" - it will convert a link string to a CID but only if it happens to fit the right format in that structure. Note the field names in format.Link are NOT the same as those in the dag-pb spec, FWIW. "cbor" -> "cbor", "dag-cbor" just uses the same ipld.Decode function -- meaning it really interprets cbor as dag-cbor. "protobuf" -> "protobuf", "dab-pb" uses merkledag.DecodeProtobuf (https://github.com/ipfs/go-merkledag/blob/master/coding.go#L110) which will actually ONLY deserialize dag-pb -- every other protobuf will fail Long and short, the current translation table is far from a set of standardized multicodec translatations. Even if we added proper cbor / json multicodec plugins to IPLD Prime, we would NOT end up replicating the current behavior. I think we either need to accept a breaking change here, or we need to accept that this is not a multicodec translator and find a different way to replicate current behavior. (or deprecate these commands and make new ones) cc: @warpfork @aschmahmann @Stebalien for further input |
My current take away from my conversations on this with @aschmahmann and @Stebalien is that if are able to support the current translation functionality, and can document the new command invocations that replace current ones, it would be acceptabel to change the names of codecs so that they are specified based on their canonical multicodec names. |
the remaining relevant sharness test failures include:
|
Please feel comfortable breaking these conversions. We'll announce the breakages in the release notes and, honestly, I can't think of a single user who won't simply be happy that the API has improved. |
3daa431
to
0c5fb17
Compare
49c78d7
to
cdde999
Compare
With ipld/go-ipld-prime#202 and ipld/go-ipld-prime#203 we mostly get down to DAG-PB form differences, which will mean fixing up the test fixtures. But, there's also newlines, which the current diff --git a/core/commands/dag/get.go b/core/commands/dag/get.go
index cef5605e1..79e355bed 100644
--- a/core/commands/dag/get.go
+++ b/core/commands/dag/get.go
@@ -70,6 +70,7 @@ func dagGet(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment) e
if err := encoder(finalNode, w); err != nil {
_ = res.CloseWithError(err)
}
+ w.Write([]byte{'\n'})
}()
return res.Emit(r) I think maybe this isn't appropriate and we should fix the tests to not have newlines. If I can now Thoughts? |
I noticed that one as well. I think I'm willing to say that not adding the extra newline here is the right behavior. |
+1 to not adding, I'm removing newlines from fixtures. Next question - is this migration also a migration to CIDv1? Some fixtures depend on output from |
i don't think we tackled that can of worms in the ipld-in-ipfs work. there's other datastore work that needs to happen as well to switch entirely to cidv1 |
42155c4
to
6522df1
Compare
IMO this is good to merge, sharness tests are passing. The go-ipfs-http-client error is ipfs/interface-go-ipfs-core#75 but that's also failing in the branch we're merging in to @ #7976 which doesn't have circleci tests run because it's a draft.
I think we should merge this branch and continue work on #7976. |
@aschmahmann do you want to read through this user-facing change-set in isolation before we merge it into the target merge branch? |
Sure, that makes sense. Do the test modifications cover all the user facing changes? |
@aschmahmann yes, the changes reflected in the tests can be summed up roughly as:
|
Updated ipld-prime with dag-cbor sorting, reverted some sharness fixtures back to orig so there's even less diff now, will comment inline on a few things. |
test_cmp cat_exp cat_out | ||
' | ||
|
||
test_expect_success "non-canonical cbor input is normalized" ' | ||
HASH=$(cat ../t0053-dag-data/non-canon.cbor | ipfs dag put --format=cbor --input-enc=raw) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess "cbor" used to mean dag-cbor? I'm not sure how "raw" was working here though since non-canon.cbor is not raw, but the CID is correct for dag-cbor. (btw I've verified many of the CIDs against the JS implementations of our codecs too so we have agreement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the incoming diff replacing these lines, which says dag-cbor
in name, --format, and --input-enc, that now seems correct, right? Agree the outgoing diff was confusing and quite likely what I'd call wrong.
test_expect_success "output looks correct" ' | ||
EXPHASH="QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n" | ||
test_expect_success "CID looks correct" ' | ||
EXPHASH="bafyreiblwimnjbqcdoeafiobk6q27jcw64ew7n2fmmhdpldd63edmjecde" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the CIDv1 form of QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n. The input object on line 25 contains "beep":[0,"bop"]
and current ipfs is encoding this 0
as a float64(0)
(fb0000000000000000
) whereas the new dag-cbor is doing the proper thing of smallest-possible int (00
). I don't know why it's interpreting an int as a float, I might look into that because it's a bit odd, but at least now it's doing the right thing and it matches the JS impl.
' | ||
|
||
test_expect_success "output looks correct" ' | ||
EXPHASH="bafyriqgae54zjl3bjebmbat2rjem4ewj6vni6jxohandmvk3bibfgv3sioyeidppsghvulryxats43br3b7afa6jy77x6gqzqaicer6ljicck" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this CID is different for the reason above re float64(0) vs int(0) encoding. I've verified the new one against the JS codec & hasher's output and it's correct
echo "Size: 15, NumBlocks: 1" > exp_stat_inner_ipld_obj && | ||
echo "Size: 8, NumBlocks: 1" > exp_stat_inner_ipld_obj && | ||
test_cmp exp_stat_inner_ipld_obj actual_stat_inner_ipld_obj && | ||
ipfs dag stat $HASH > actual_stat_ipld_obj && | ||
echo "Size: 61, NumBlocks: 2" > exp_stat_ipld_obj && | ||
echo "Size: 54, NumBlocks: 2" > exp_stat_ipld_obj && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc these differences come entirely from {"data":123}
being encoded properly now, a16464617461187b
:
a1 # map(1)
64 # string(4)
64617461 # "data"
18 7b # uint(123)
whereas the current impl is doing the float thing again: a16464617461fb405ec00000000000
:
a1 # map(1)
64 # string(4)
64617461 # "data"
fb 405ec00000000000 # float(123)
so we're losing 7 bytes on that block, then the same number on the cumulative count for the block that links to it
@rvagg not seeing anything further raised here for 3 days I think we're okay to merge this PR into the main target merge branch |
agreed, @hannahhoward and @aschmahmann we'll merge this and continue working on the |
This is the only non-plugin use of the
core/coredag
package.The only thing we need to understand is that there appears to be a provision where a dag that is represented by a single block in the import codec may become multiple blocks in it's serialized
format
codec. The previous code was able to parse input into multiple blocks, although it needs deeper investigation as to how such boundaries are realized in practice.