Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

want encoder and decoder methods #4

Closed
Tracked by #8399
warpfork opened this issue Aug 26, 2021 · 11 comments
Closed
Tracked by #8399

want encoder and decoder methods #4

warpfork opened this issue Aug 26, 2021 · 11 comments
Assignees

Comments

@warpfork
Copy link
Contributor

It would be nice to have this code export some functions called Encode and Decode that match https://pkg.go.dev/github.com/ipld/go-ipld-prime@v0.12.0/codec#Encoder and https://pkg.go.dev/github.com/ipld/go-ipld-prime@v0.12.0/codec#Decoder.

This would be a prerequisite for getting this to land as a plugin in go-ipfs, because there, things need to conform to those function interfaces so they can be put in a registry map keyed by their multicodec indicator code. (The git plugin is an example of what gets wired up: https://github.com/ipfs/go-ipfs/blob/ae09459e3926d687599638abdb379e8627b53509/plugin/plugins/git/git.go#L40-L41 )

I think the StoreJOSE and LoadJOSE functions are probably close, so all the right logic exists somewhere, but the interfaces don't quite line up.

@alexjg
Copy link
Contributor

alexjg commented Aug 26, 2021

Do I understand correctly that given that this dag-jose uses dag-cbor under the hood then the Encode and Decode functions woud be the same as those exported by the cbor codec?

@oed
Copy link
Member

oed commented Aug 26, 2021

Do I understand correctly that given that this dag-jose uses dag-cbor under the hood then the Encode and Decode functions woud be the same as those exported by the cbor codec?

This is not really correct. If the data is decoded using dag-cbor payload, signature, protected would be byte arrays, we want them to be base64url strings.

@alexjg
Copy link
Contributor

alexjg commented Aug 26, 2021

I think that might be specific to the javascript implementation no? In this
implementation the conversion between the general JSON serialization and the
ipld.Node implementations are handled separately. i.e. if you have a JSON
representation of a JWS|JWE you first call dagjose.Parse(JWS|JWE) - which
converts any base64url strings to []byte - and then pass the resulting
dagjose.Dag(JWS|JWE) which implements ipld.Node to the go-ipld-prime
machinery to encode. Similarly if you have a dagjose.Dag(JWE|JWS) then you
can call .GeneralJSONSerialization, which encodes any []byte as base64url
strings.

This is in line with the spec which specifies that the underlying
representation of any field which is a base64url string in the JOSE spec
should be stored as a byte array in the IPLD representation.

@oed
Copy link
Member

oed commented Aug 26, 2021

I don't think that's right. I think the ipld.Node should cointain the JWS/JWE in the general encoding (+/- link) property. The base64url string should be converted to bytes when encoded to the blockstore.

@alexjg
Copy link
Contributor

alexjg commented Aug 26, 2021

When you say ipld.Node should contain the general encoding do you mean that
the ipld.Node implementation this package provides should be just a newtype
wrapper around a map[string]interface{} containing the JSON general
serialization? If so could you explain what the reason for doing that would be?
Or have I misunderstood what you're suggesting?

The reasoning behind the current architecture is that at some point we have to
parse the JSON serialization anyway in order to validate that we are writing
spec compliant data and so storing the underlying representation as a struct
rather than just map[string]interface{} doesn't cost us anything. Additionally
this decision is not exposed at the API level, from the perspective of API users
they are using a dagjose.DagJOSE object regardless of whether it's a newtype
over a map[string]interface{} or a struct.

@oed
Copy link
Member

oed commented Aug 26, 2021

Ok, I'm probably missing something here.

Let me retrace my steps and instead check if the following assumption correct?

Put data into the dag

  1. Call ipfs.dag.put(JWS, { format: 'dag-jose' }) on the http api
  2. Encode get's called and returns a datamodel.Node
  3. The Node get's encoded into a bytearray and put into the blockstore

Get data from the dag

  1. Call ipfs.dag.get(cid) on the http api
  2. bytes are retrieved from the blockstore and converted into a Node
  3. Encode gets called with the node and coverts it to the json data
  4. It returns it to the http requests

Alright, from writing this out it's clear I'm definitely missing parts of what's going on there! Does the Encode/Decode functions mimic the functionality provided by js-multiformats?

@alexjg
Copy link
Contributor

alexjg commented Aug 26, 2021

Ah I see what you're getting at. I think there's a slight confusion because the
ipld prime APIs are not the same as the js-multiformats one, but there is also
an underlying problem we need to solve.

In the dag/put command the incoming data is assumed to be in json format and
the default format to write to IPFS is assumed to be cbor. These can be
overwritten by setting the input-enc and format arguments respectively. The
IPFS API then uses it's encoder registry to lookup implementations of Encoder
and Decoder for the codecs you've given.

This is possibly made clearer by examining the Encoder type:

type Encoder func(datamodel.Node, io.Writer) error

There is no parameter there to specify the codec (unlike in the
js-multiformats Block.encode), it is assumed that by this point you already
know how you want to encode the Node implementation.

Note also that datamodel.Node is an interface, which is not implemented by
map[string]interface{} (it not being possible to implement interfaces for
primitive types in Go) so at the very least we have to wrap the raw
map[string]interface{} of the underlying JSON serialization in a newtype
wrapper and provide constructors for it, which is what I was alluding to in my
previous comment.

Now, I think what you are expecting to happen is that the user posts a general
JSON serialization of a JOSE object with the input-enc set to json and the
format set to dag-jose, the IPFS API uses the json decoder to decode this to
an ipld.Node and then uses the dag-jose encoder to encode to CBOR. This
final step is where you are expecting the translation from url encoded byte
arrays to CBOR byte arrays to occur?

If this is the case then I understand why it is desirable. It would be nice for
users to be able to just send JSON to the IPFS API without knowing anything
about the dag-jose encoding.

Unfortunately the dag-json encoding (which is what the json input format
corresponds to) does not allow encoding byte arrays. We could modify the
dagjose.NodeAssembler to accept strings as well as byte arrays everywhere it
currently accepts byte arrays, which is what I think you are suggesting. This
would violate the contract that Encoder and Decoder are expected to
roundtrip each others data, @warpfork is that a serious problem? To be clear
the roundtrip failing would be that base64url encoded strings would be turned
into byte arrays.

This is a little ad-hoc in my opinion, we're effectively hacking around the fact
that dag-json does not allow byte arrays. I think there are two more technically
appealing alternatives:

  1. Require that clients use an encoding which does support byte arrays when
    dag.putting dag-jose data.
  2. Change the dag-jose spec to specify that the
    underlying elements are encoded as strings in the IPLD data model

@oed
Copy link
Member

oed commented Aug 26, 2021

Thanks for clarifying @alexjg!
To me it seems like this boils down to if the js-ipfs-http-client posts json or cbor to the ipfs http api. From some of the experiments that @Geo25rey have been conducting it seems like it does post using the cbor format so maybe we can assume that?

@alexjg
Copy link
Contributor

alexjg commented Aug 26, 2021

That would suggest that just using the dag-cbor Encoder would be sufficient. We should do some experiments to check (for what it's worth I did do some interop tests with the JS dag-jose stuff when I wrote this, which indicates that it should probably work okay, but I don't think I was using the dag API for those tests, just the block API).

Even if that does the trick I wonder if it's worth having an Encoder which throws an error if the data isn't a valid dag-jose object. This would be tricky because the encoder would need to have some state to keep track of what it has and hasn't written, but it might be worth it to minimize the chances of applications encountering bad data.

@warpfork
Copy link
Contributor Author

warpfork commented Aug 29, 2021

I had thought that the serial content for dag-jose was going to be dag-cbor, yes...?

To try to briefly re-ground this in definitions: the serial encoding a codec uses could be anything, but the one hard line is that if we're going to call it a "codec", then it must define a mapping from the serial form to a data model form, and back again. So it needs to pick a serial form. If there's going to be a cbor-ish version and a json-ish version, we would need to reserve two separate multicodecs for that. So I think for now we probably want to say that dag-jose has a cborish encoding, and move on with that.

(The fact that it would seem easy to define how these JOSE features work in a serial-format-agnostic way is why I tried to steer the conversation in the direction of ADLs instead of codecs way back at the beginning of this work -- but I guess that ship has largely sailed now. Ah well.)


Go-ipfs's APIs with their --format and --input-enc and whatnot flags are probably best to think about as being a step separated from what a an IPLD codec is. Having different codec names for those two parameters means that ipfs is doing a transcode operation. I'm not actually the expert on all the ins and outs of those go-ipfs APIs, but I think the following is true:

IIUC, to use dag-jose, one would use ipfs dag put --format=dag-jose.

IIUC, to pipe data in most directly, without transcoding, one would want to say ipfs dag put --format=dag-jose --input-enc=dag-jose and then pipe in serialized data already in that form.

Incidentally, IIUC, the ipfs dag put --help docs that have said the default input format is "json" have actually been a slight lie, and behaviorally, they've always actually been dag-json. IIUC, the next release of go-ipfs will be fixing the weirdness and updating the command to state that its default is dag-json (so, behaviourally, nothing changes; it's just less confusing now). This is potentially relevant because dag-json actually does support bytes. (Although I think one would be better off stating --input-enc=dag-jose anyway, so, maybe this isn't ultimately all that relevant after all.)

I don't think there's a requirement that any possible pair of codec names has to be valid for --format and --input-enc. The go-ipfs code that handles them is just going to attempt to use the --input-enc codec to turn the input data into IPLD Data Model, and then serialize that out again with the --format codec. So, if one of them happens to be a subset of the other, that'll either cramp the expressiveness that's possible in that pipeline, or, potentially result in an outright error (if the --format codec can't accept some structure that a user created by feeding it in with a more expressive --input-enc). But this should be sort of fine, because again, the one thing that should always be guaranteed to work is just giving the same codec name for --format and for --input-enc.

We can hammer all of this stuff out in exact detail with PRs in go-ipfs where we wire up this repo as a plugin, get the encode and decode functions into go-ipfs's multicodec registries, and can then test it end-to-end (probably using "sharness" tests) with all sorts of variations of args that do transcoding. I think I'd advise that we get some serial fixtures that make sense to us just out of the encode and decode functions alone, first, though -- clearer to do that and make sure it's well-defined in isolation.

@Geo25rey
Copy link

The difference between dag-cbor and cbor is dag-cbor can serialize the type Link (otherwise known as CID) while cbor does not. Over the wire, there are no CIDs in the spec, but there is the link field when using dag-jose objects locally. I have submitted PR #3 with the fixes that have gotten the codec to work with IPFS using cbor in the past. Feel free to have a look @alexjg @oed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants