-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: dag-jose
implementation using IPLD schema/code generation
#23
Conversation
523609e
to
4a8bcb0
Compare
b120c5b
to
f1fcd6d
Compare
dagjose/gen/gen.go
Outdated
// `link` is not encoded as part of DAG-JOSE because it is not included in the DAG-JOSE spec but is included | ||
// here in the schema because it is required when decoding/encoding from/to other encodings (e.g. DAG-JSON). | ||
// If `payload` is present during decode, `link` is added with contents matching `payload`. If `link` is present | ||
// during encode, it is validated against `payload` and then ignored. | ||
schema.SpawnStructField("link", "Link", true, false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just annotating as I go in reading)
This initially seemed strange (and would be, in almost any other use of schemas)... but as long as I keep reminding myself that this isn't really a general purpose schema, it's just meant to used inside one specific codec which is doing special logical operations such as this one described in the comment... then it seems fine.
(And I definitely appreciate the comment being here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- wait, sorry. I don't understand what's going on here at all.
I initially assumed these definitions matched what's in the dag-jose spec, but they don't? There's no link
fields in this structure as it is stated there?
This is related to my confusion in the other comment in https://github.com/ceramicnetwork/go-dag-jose/pull/23/files?diff=unified&w=0#r749816426 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still wrapping my head around how builders/assemblers can/should be used 😕
I was trying to match the schema from the spec but needed link
in there to allow "copying" into the dag-jose
builder from another encoding that included link
, e.g. dag-json
. That way, I could accept it into the builder then drop it on the floor during encode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing I could do would be to use the same type of logic I used to "unflatten" nodes (i.e. "flattened" node -> reconstructed map -> "general" node) to skip link
fields and then remove them from the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what's up now. Is "DecodedJOSE" in the code here a combination of the "DecodedJWE" and "DecodedJWS" types as stated in the schema?
I think I understand the desire there (to minimize code?) but this seems relatively risky. It means we need the "figure out which one a message is, and reject the other fields" in application logic again. We see some of that with the code around the "link" field, but I sorta suspect that's not all of it, because there's a lot more fields in the "DecodedJWE" struct spec than in the "DecodedJWS" struct spec.
I wonder if this wouldn't be safer and get more work "done for free" by just making both those types fully, and when decoding, just try decoding into each type. If one fails, try the other; if they both fail, then decode fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I'd want to test here, in any case, is to make sure we know what fields actually show up when iterating over the data model. As soon as I can find some time, I'll propose some more fixtures which probe this. I want to make sure that this implementation and the JS implementation are aligned on that matter, because if not, higher level things like Selectors will start producing nasty surprises. It's possible that the combination of the dag-jose and the dag-json fixtures in ipld/codec-fixtures#11 test that, but if so, it's pretty indirectly and a bit hard to call it "legible", so I'd like to add more fixtures which make it clearer. (And get them exercised from this PR, too, so we're sure.)
(I'm a little scared that something in the implementation detail of thwacking these two structures together will "leak". Maybe it's nothing, but some fixtures shining a light directly on the question of "what paths do we visit, when walking the returned nodes?" would answer it definitively.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that makes sense. I like the idea of being able to fully rely on the schema to validate the types instead of having an ugly mixture of some schema-based rules and some application ones.
I started out with distinct types but then later combined them to reduce cardinality. It kept things simpler and easier to reason about while prototyping (and that's also how it was done in the prior version of the code, presumably for the same reason lol). I seem to recall some other hurdle I ran into when going with that decision but I've been through so many experiments I can't remember what exactly it was.
Now that I feel much more comfortable with IPLD node assemblers though, I don't think it will be difficult to make and switch to distinct types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link
will continue to be a special case. We can't encode it (to maintain compatibility with JS encodings) and I also didn't want to not return it from decode so that I could keep behavior identical to JS decode.
I do like your thought of not conflating encode/decode and presentation, so having a separate presentation layer could be clearer and more helpful. That's starting to sound like an ADL lol (IIUC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ipld/ipld#151 are the fixtures I propose to help clarify this.
dagjose/multicodec.go
Outdated
// DAG-CBOR is a superset of DAG-JOSE and can be used to decode valid DAG-JOSE objects. Use DAG-CBOR decoding but do | ||
// not allow IPLD Links. See: https://specs.ipld.io/block-layer/codecs/dag-jose.html | ||
err := dagcbor.DecodeOptions{ | ||
AllowLinks: false, | ||
}.Decode(joseBuilder, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to say that this seems unnecessary (because if you're decoding it into something with a schema, that should already enforce this rule and more, right?)... but then remembered the "link"
field, which has these special rules about how it should be synthesized. So, okay, maybe it makes sense to do this.
But it draws attention to the fact we seem to be using the types as both a decoding target, and as something we return to people to look at... despite those being logically not the same thing.
I can't tell if this is strange looking but fine in both spec and code, or code worth worrying about but still spec fine; or if this hints towards something in the spec that's weird enough that it makes me want to review the spec again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rvagg, I'm not sure you have any free time, but if you do and could have a look at how this link property is derived and if that makes sense to you, I'd kinda appreciate a doublecheck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm quite unsatisfied with the contortions I've had to make to match the behavior from the spec. I'm sure much of that is probably my inability to figure out how best to leverage code generation and the general Node assembly framework.
I just pushed what is likely to be the last significant code update to the main body of the codec that adds support for accepting "flattened" serialization. Wanted to put it up quickly (though not quite refined yet) so you and @rvagg can look at that too.
I have a feeling that some of the utility methods I've added to force the codec into submission might not pass muster 😅 Please let me know if there are better ways to do what I'm trying to achieve, mainly to re-assemble constructed nodes into a different shape (e.g. from a "flattened" node to a "general" node).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link
property is derived like this:
- take the bytes in the payload (this should be a CID)
- encode them as a CID Link (in js land that's a CID instance)
- put it at the
link
property
dagjose/multicodec.go
Outdated
// If the passed `NodeAssembler` is not of type `_DecodedJOSE__ReprBuilder`, create and use a | ||
// `_DecodedJOSE__ReprBuilder`. | ||
joseBuilder, alreadyJose := na.(*_DecodedJOSE__ReprBuilder) | ||
if !alreadyJose { | ||
joseBuilder = Type.DecodedJOSE__Repr.NewBuilder().(*_DecodedJOSE__ReprBuilder) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This successfully detects and shortcuts if someone hands in the builder form...
But not if they hand in the assembler, for whatever reason.
I had to double-check myself quick on how this works, and so came out with a gist: https://play.golang.org/p/zdvRuU3WY1g . Moral of the gist is: you can certainly make code that works on either type, but unfortunately when doing the assertions to see what was handed in, you might want to do two of them, so either builder or assembler form will both hit the fastpath.
(My apologies for this being a little confusing. A lot of this area was designed with "maximum fast" in mind, and the ease-of-use is correspondingly... well. I wish I could think of a way to make this simpler!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, gotcha, will update, ty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, looks like the coercion doesn't work with the way the builder/assembler are generated 😞
type _DecodedJOSE__ReprBuilder struct {
_DecodedJOSE__ReprAssembler
}
type _DecodedJOSE__ReprAssembler struct {
thing int // n.b. same name "thing", same int type
}
./prog.go:37:41: cannot convert assembler (type *_DecodedJOSE__ReprAssembler) to type *_DecodedJOSE__ReprBuilder
versus the way it's in the example you sent:
type _DecodedJOSE__ReprBuilder struct {
thing int
}
type _DecodedJOSE__ReprAssembler struct {
thing int // n.b. same name "thing", same int type
}
Or is there some other way to do this that I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works like this but I'm not sure if it like it:
builder := &_DecodedJOSE__ReprBuilder{*assembler}
_ = builder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, you know what? It might not be necessary to coerce the assembler into a builder.
The fastpath would be to pass either a JOSE builder or a JOSE assembler on to CBOR decode.
Could I do something like this?
copyRequired := false
jwsBuilder, castOk := na.(*_DecodedJWS__ReprBuilder)
if !castOk {
// This could still be `_DecodedJWS__ReprAssembler`, so check for that.
_, castOk := na.(*_DecodedJWS__ReprAssembler)
if !castOk {
// No fastpath possible, just create a new `_DecodedJWE__ReprBuilder`, use it, then copy the built node into
// the assembler the caller passed in.
jwsBuilder = Type.DecodedJWS__Repr.NewBuilder().(*_DecodedJWS__ReprBuilder)
copyRequired = true
}
}
if err := dagcbor.Decode(jwsBuilder, r); err != nil {
return err
}
* Introduce tests using fixtures from the ipld/ipld repo. * tweak some fixture hunk names. See commit message in the linked commit in the ipld/ipld submodule. (Calling both the dagjose and the dagjson hunks "data", as it was previously, suggested incorrect things: they are not that similar. The dagjson is the datamodel view, and that is *not* the same thing as a transliteration of the tokens that you'd get if you parsed the dag-jose as dag-cbor.)
dagjose/spec_test.go
Outdated
n, err := ipld.Decode(fixtureDataBinary, Decode) | ||
if err != nil { | ||
t.Fatalf("%s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still have a logically identical n
from the parent scope -- you can just keep using it here and skip these lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right, will do.
dagjose/spec_test.go
Outdated
encoder, err := linkSystem.EncoderChooser(dagJOSELink) | ||
if err != nil { | ||
t.Fatalf("could not choose an encoder: %v", err) | ||
} | ||
hasher, err := linkSystem.HasherChooser(dagJOSELink) | ||
if err != nil { | ||
t.Fatalf("could not choose a hasher: %v", err) | ||
} | ||
err = encoder(n, hasher) | ||
if err != nil { | ||
t.Fatalf("%s", err) | ||
} | ||
lnk := dagJOSELink.BuildLink(hasher.Sum(nil)) | ||
cidLink, ok := lnk.(cidlink.Link) | ||
if !ok { | ||
t.Fatalf("%s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoa whoa whoa. I'm pretty sure this is all just linkSystem.ComputeLink
isn't it? You can do way less work than this :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, copy-pasta failure there. Fixed.
dagjose/spec_test.go
Outdated
if !ok { | ||
t.Fatalf("%s", err) | ||
} | ||
fixtureCidString := strings.Replace(string(fixtureCid.Hunk.Body), "\n", "", -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistic nit, but I'd tend to use strings.TrimSpace
for this. Needs many fewer arguments.
dag-jose
implementation using IPLD schema/code generation - #4Description
This PR reimplements the existing
dag-jose
codec using code generated using an IPLD schema.The primary motivation for this was to move away from custom, handwired IPLD Node assemblers that were harder to debug.
Input data can have "general" and "flattened" serialization, encoded data always has "general" serialization.
How Has This Been Tested?
ipfs dag put --store-codec dag-jose
andipfs dag get
work correctlyDefinition of Done
Before submitting this PR, please make sure:
TODOs (for subsequent PRs):
References: