want encoder and decoder methods #4

warpfork · 2021-08-26T00:54:36Z

It would be nice to have this code export some functions called Encode and Decode that match https://pkg.go.dev/github.com/ipld/go-ipld-prime@v0.12.0/codec#Encoder and https://pkg.go.dev/github.com/ipld/go-ipld-prime@v0.12.0/codec#Decoder.

This would be a prerequisite for getting this to land as a plugin in go-ipfs, because there, things need to conform to those function interfaces so they can be put in a registry map keyed by their multicodec indicator code. (The git plugin is an example of what gets wired up: https://github.com/ipfs/go-ipfs/blob/ae09459e3926d687599638abdb379e8627b53509/plugin/plugins/git/git.go#L40-L41 )

I think the StoreJOSE and LoadJOSE functions are probably close, so all the right logic exists somewhere, but the interfaces don't quite line up.

The text was updated successfully, but these errors were encountered:

alexjg · 2021-08-26T11:19:29Z

Do I understand correctly that given that this dag-jose uses dag-cbor under the hood then the Encode and Decode functions woud be the same as those exported by the cbor codec?

oed · 2021-08-26T12:06:09Z

Do I understand correctly that given that this dag-jose uses dag-cbor under the hood then the Encode and Decode functions woud be the same as those exported by the cbor codec?

This is not really correct. If the data is decoded using dag-cbor payload, signature, protected would be byte arrays, we want them to be base64url strings.

alexjg · 2021-08-26T12:20:23Z

I think that might be specific to the javascript implementation no? In this
implementation the conversion between the general JSON serialization and the
ipld.Node implementations are handled separately. i.e. if you have a JSON
representation of a JWS|JWE you first call dagjose.Parse(JWS|JWE) - which
converts any base64url strings to []byte - and then pass the resulting
dagjose.Dag(JWS|JWE) which implements ipld.Node to the go-ipld-prime
machinery to encode. Similarly if you have a dagjose.Dag(JWE|JWS) then you
can call .GeneralJSONSerialization, which encodes any []byte as base64url
strings.

This is in line with the spec which specifies that the underlying
representation of any field which is a base64url string in the JOSE spec
should be stored as a byte array in the IPLD representation.

oed · 2021-08-26T12:42:16Z

I don't think that's right. I think the ipld.Node should cointain the JWS/JWE in the general encoding (+/- link) property. The base64url string should be converted to bytes when encoded to the blockstore.

alexjg · 2021-08-26T13:00:43Z

When you say ipld.Node should contain the general encoding do you mean that
the ipld.Node implementation this package provides should be just a newtype
wrapper around a map[string]interface{} containing the JSON general
serialization? If so could you explain what the reason for doing that would be?
Or have I misunderstood what you're suggesting?

The reasoning behind the current architecture is that at some point we have to
parse the JSON serialization anyway in order to validate that we are writing
spec compliant data and so storing the underlying representation as a struct
rather than just map[string]interface{} doesn't cost us anything. Additionally
this decision is not exposed at the API level, from the perspective of API users
they are using a dagjose.DagJOSE object regardless of whether it's a newtype
over a map[string]interface{} or a struct.

oed · 2021-08-26T13:58:33Z

Ok, I'm probably missing something here.

Let me retrace my steps and instead check if the following assumption correct?

Put data into the dag

Call ipfs.dag.put(JWS, { format: 'dag-jose' }) on the http api
Encode get's called and returns a datamodel.Node
The Node get's encoded into a bytearray and put into the blockstore

Get data from the dag

Call ipfs.dag.get(cid) on the http api
bytes are retrieved from the blockstore and converted into a Node
Encode gets called with the node and coverts it to the json data
It returns it to the http requests

Alright, from writing this out it's clear I'm definitely missing parts of what's going on there! Does the Encode/Decode functions mimic the functionality provided by js-multiformats?

alexjg · 2021-08-26T15:05:39Z

Ah I see what you're getting at. I think there's a slight confusion because the
ipld prime APIs are not the same as the js-multiformats one, but there is also
an underlying problem we need to solve.

In the dag/put command the incoming data is assumed to be in json format and
the default format to write to IPFS is assumed to be cbor. These can be
overwritten by setting the input-enc and format arguments respectively. The
IPFS API then uses it's encoder registry to lookup implementations of Encoder
and Decoder for the codecs you've given.

This is possibly made clearer by examining the Encoder type:

type Encoder func(datamodel.Node, io.Writer) error

There is no parameter there to specify the codec (unlike in the
js-multiformats Block.encode), it is assumed that by this point you already
know how you want to encode the Node implementation.

Note also that datamodel.Node is an interface, which is not implemented by
map[string]interface{} (it not being possible to implement interfaces for
primitive types in Go) so at the very least we have to wrap the raw
map[string]interface{} of the underlying JSON serialization in a newtype
wrapper and provide constructors for it, which is what I was alluding to in my
previous comment.

Now, I think what you are expecting to happen is that the user posts a general
JSON serialization of a JOSE object with the input-enc set to json and the
format set to dag-jose, the IPFS API uses the json decoder to decode this to
an ipld.Node and then uses the dag-jose encoder to encode to CBOR. This
final step is where you are expecting the translation from url encoded byte
arrays to CBOR byte arrays to occur?

If this is the case then I understand why it is desirable. It would be nice for
users to be able to just send JSON to the IPFS API without knowing anything
about the dag-jose encoding.

Unfortunately the dag-json encoding (which is what the json input format
corresponds to) does not allow encoding byte arrays. We could modify the
dagjose.NodeAssembler to accept strings as well as byte arrays everywhere it
currently accepts byte arrays, which is what I think you are suggesting. This
would violate the contract that Encoder and Decoder are expected to
roundtrip each others data, @warpfork is that a serious problem? To be clear
the roundtrip failing would be that base64url encoded strings would be turned
into byte arrays.

This is a little ad-hoc in my opinion, we're effectively hacking around the fact
that dag-json does not allow byte arrays. I think there are two more technically
appealing alternatives:

Require that clients use an encoding which does support byte arrays when
dag.putting dag-jose data.
Change the dag-jose spec to specify that the
underlying elements are encoded as strings in the IPLD data model

oed · 2021-08-26T18:36:50Z

Thanks for clarifying @alexjg!
To me it seems like this boils down to if the js-ipfs-http-client posts json or cbor to the ipfs http api. From some of the experiments that @Geo25rey have been conducting it seems like it does post using the cbor format so maybe we can assume that?

alexjg · 2021-08-26T19:33:42Z

That would suggest that just using the dag-cbor Encoder would be sufficient. We should do some experiments to check (for what it's worth I did do some interop tests with the JS dag-jose stuff when I wrote this, which indicates that it should probably work okay, but I don't think I was using the dag API for those tests, just the block API).

Even if that does the trick I wonder if it's worth having an Encoder which throws an error if the data isn't a valid dag-jose object. This would be tricky because the encoder would need to have some state to keep track of what it has and hasn't written, but it might be worth it to minimize the chances of applications encountering bad data.

warpfork · 2021-08-29T22:05:48Z

I had thought that the serial content for dag-jose was going to be dag-cbor, yes...?

To try to briefly re-ground this in definitions: the serial encoding a codec uses could be anything, but the one hard line is that if we're going to call it a "codec", then it must define a mapping from the serial form to a data model form, and back again. So it needs to pick a serial form. If there's going to be a cbor-ish version and a json-ish version, we would need to reserve two separate multicodecs for that. So I think for now we probably want to say that dag-jose has a cborish encoding, and move on with that.

(The fact that it would seem easy to define how these JOSE features work in a serial-format-agnostic way is why I tried to steer the conversation in the direction of ADLs instead of codecs way back at the beginning of this work -- but I guess that ship has largely sailed now. Ah well.)

Go-ipfs's APIs with their --format and --input-enc and whatnot flags are probably best to think about as being a step separated from what a an IPLD codec is. Having different codec names for those two parameters means that ipfs is doing a transcode operation. I'm not actually the expert on all the ins and outs of those go-ipfs APIs, but I think the following is true:

IIUC, to use dag-jose, one would use ipfs dag put --format=dag-jose.

IIUC, to pipe data in most directly, without transcoding, one would want to say ipfs dag put --format=dag-jose --input-enc=dag-jose and then pipe in serialized data already in that form.

Incidentally, IIUC, the ipfs dag put --help docs that have said the default input format is "json" have actually been a slight lie, and behaviorally, they've always actually been dag-json. IIUC, the next release of go-ipfs will be fixing the weirdness and updating the command to state that its default is dag-json (so, behaviourally, nothing changes; it's just less confusing now). This is potentially relevant because dag-json actually does support bytes. (Although I think one would be better off stating --input-enc=dag-jose anyway, so, maybe this isn't ultimately all that relevant after all.)

I don't think there's a requirement that any possible pair of codec names has to be valid for --format and --input-enc. The go-ipfs code that handles them is just going to attempt to use the --input-enc codec to turn the input data into IPLD Data Model, and then serialize that out again with the --format codec. So, if one of them happens to be a subset of the other, that'll either cramp the expressiveness that's possible in that pipeline, or, potentially result in an outright error (if the --format codec can't accept some structure that a user created by feeding it in with a more expressive --input-enc). But this should be sort of fine, because again, the one thing that should always be guaranteed to work is just giving the same codec name for --format and for --input-enc.

We can hammer all of this stuff out in exact detail with PRs in go-ipfs where we wire up this repo as a plugin, get the encode and decode functions into go-ipfs's multicodec registries, and can then test it end-to-end (probably using "sharness" tests) with all sorts of variations of args that do transcoding. I think I'd advise that we get some serial fixtures that make sense to us just out of the encode and decode functions alone, first, though -- clearer to do that and make sure it's well-defined in isolation.

Geo25rey · 2021-10-11T14:25:06Z

The difference between dag-cbor and cbor is dag-cbor can serialize the type Link (otherwise known as CID) while cbor does not. Over the wire, there are no CIDs in the spec, but there is the link field when using dag-jose objects locally. I have submitted PR #3 with the fixes that have gotten the codec to work with IPFS using cbor in the past. Feel free to have a look @alexjg @oed

warpfork mentioned this issue Aug 26, 2021

Support dag-jose codec by default ipfs/kubo#8364

Closed

3 tasks

smrz2001 self-assigned this Aug 30, 2021

smrz2001 mentioned this issue Aug 30, 2021

End-to-end tests verifying data flowing in and out of the go-ipfs APIs #5

Closed

warpfork mentioned this issue Aug 31, 2021

outline of how dag-jose codec could be connected as a plugin. ipfs/kubo#8399

Draft

2 tasks

smrz2001 mentioned this issue Sep 22, 2021

feat(dagjose): add encoder and decoder methods #13

Closed

smrz2001 added the Epic label Oct 18, 2021

decentralgabe added the open mainnet label Nov 3, 2021

oed removed the open mainnet label Nov 4, 2021

decentralgabe added the open mainnet label Nov 5, 2021

smrz2001 mentioned this issue Nov 13, 2021

feat: dag-jose implementation using IPLD schema/code generation #23

Merged

10 tasks

smrz2001 closed this as completed Nov 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

want encoder and decoder methods #4

want encoder and decoder methods #4

warpfork commented Aug 26, 2021

alexjg commented Aug 26, 2021

oed commented Aug 26, 2021

alexjg commented Aug 26, 2021

oed commented Aug 26, 2021

alexjg commented Aug 26, 2021

oed commented Aug 26, 2021

alexjg commented Aug 26, 2021

oed commented Aug 26, 2021

alexjg commented Aug 26, 2021 •

edited

Loading

warpfork commented Aug 29, 2021 •

edited

Loading

Geo25rey commented Oct 11, 2021

want encoder and decoder methods #4

want encoder and decoder methods #4

Comments

warpfork commented Aug 26, 2021

alexjg commented Aug 26, 2021

oed commented Aug 26, 2021

alexjg commented Aug 26, 2021

oed commented Aug 26, 2021

alexjg commented Aug 26, 2021

oed commented Aug 26, 2021

Put data into the dag

Get data from the dag

alexjg commented Aug 26, 2021

oed commented Aug 26, 2021

alexjg commented Aug 26, 2021 • edited Loading

warpfork commented Aug 29, 2021 • edited Loading

Geo25rey commented Oct 11, 2021

alexjg commented Aug 26, 2021 •

edited

Loading

warpfork commented Aug 29, 2021 •

edited

Loading