feat: `dag-jose` implementation using IPLD schema/code generation #23

smrz2001 · 2021-11-13T13:36:23Z

`dag-jose` implementation using IPLD schema/code generation - #4

Description

This PR reimplements the existing dag-jose codec using code generated using an IPLD schema.

The primary motivation for this was to move away from custom, handwired IPLD Node assemblers that were harder to debug.

Input data can have "general" and "flattened" serialization, encoded data always has "general" serialization.

How Has This Been Tested?

Unit tested
Tested successfully against all fixtures from this PR.
ipfs dag put --store-codec dag-jose and ipfs dag get work correctly

Definition of Done

Before submitting this PR, please make sure:

I have added relevant tests for new or updated functionality
My code follows conventions, is well commented, and easy to understand
My code builds and tests pass without any errors or warnings
I have tagged the relevant reviewers
I have updated the READMEs of affected packages
I have made corresponding changes to the documentation
The changes have been communicated to interested parties

TODOs (for subsequent PRs):

Encode Compact Serialization JOSE objects

References:

https://specs.ipld.io/block-layer/codecs/dag-jose.html#mapping-from-the-jose-general-json-serialization-to-dag-jose-serialization

warpfork · 2021-11-16T00:05:21Z

dagjose/gen/gen.go

+		// `link` is not encoded as part of DAG-JOSE because it is not included in the DAG-JOSE spec but is included
+		// here in the schema because it is required when decoding/encoding from/to other encodings (e.g. DAG-JSON).
+		// If `payload` is present during decode, `link` is added with contents matching `payload`. If `link` is present
+		// during encode, it is validated against `payload` and then ignored.
+		schema.SpawnStructField("link", "Link", true, false),


(just annotating as I go in reading)

This initially seemed strange (and would be, in almost any other use of schemas)... but as long as I keep reminding myself that this isn't really a general purpose schema, it's just meant to used inside one specific codec which is doing special logical operations such as this one described in the comment... then it seems fine.

(And I definitely appreciate the comment being here.)

-- wait, sorry. I don't understand what's going on here at all.

I initially assumed these definitions matched what's in the dag-jose spec, but they don't? There's no link fields in this structure as it is stated there?

This is related to my confusion in the other comment in https://github.com/ceramicnetwork/go-dag-jose/pull/23/files?diff=unified&w=0#r749816426 .

I'm still wrapping my head around how builders/assemblers can/should be used 😕

I was trying to match the schema from the spec but needed link in there to allow "copying" into the dag-jose builder from another encoding that included link, e.g. dag-json. That way, I could accept it into the builder then drop it on the floor during encode.

Another thing I could do would be to use the same type of logic I used to "unflatten" nodes (i.e. "flattened" node -> reconstructed map -> "general" node) to skip link fields and then remove them from the schema.

I think I understand what's up now. Is "DecodedJOSE" in the code here a combination of the "DecodedJWE" and "DecodedJWS" types as stated in the schema?

I think I understand the desire there (to minimize code?) but this seems relatively risky. It means we need the "figure out which one a message is, and reject the other fields" in application logic again. We see some of that with the code around the "link" field, but I sorta suspect that's not all of it, because there's a lot more fields in the "DecodedJWE" struct spec than in the "DecodedJWS" struct spec.

I wonder if this wouldn't be safer and get more work "done for free" by just making both those types fully, and when decoding, just try decoding into each type. If one fails, try the other; if they both fail, then decode fails.

I think what I'd want to test here, in any case, is to make sure we know what fields actually show up when iterating over the data model. As soon as I can find some time, I'll propose some more fixtures which probe this. I want to make sure that this implementation and the JS implementation are aligned on that matter, because if not, higher level things like Selectors will start producing nasty surprises. It's possible that the combination of the dag-jose and the dag-json fixtures in ipld/codec-fixtures#11 test that, but if so, it's pretty indirectly and a bit hard to call it "legible", so I'd like to add more fixtures which make it clearer. (And get them exercised from this PR, too, so we're sure.)

(I'm a little scared that something in the implementation detail of thwacking these two structures together will "leak". Maybe it's nothing, but some fixtures shining a light directly on the question of "what paths do we visit, when walking the returned nodes?" would answer it definitively.)

Hmm, that makes sense. I like the idea of being able to fully rely on the schema to validate the types instead of having an ugly mixture of some schema-based rules and some application ones.

I started out with distinct types but then later combined them to reduce cardinality. It kept things simpler and easier to reason about while prototyping (and that's also how it was done in the prior version of the code, presumably for the same reason lol). I seem to recall some other hurdle I ran into when going with that decision but I've been through so many experiments I can't remember what exactly it was.

Now that I feel much more comfortable with IPLD node assemblers though, I don't think it will be difficult to make and switch to distinct types.

link will continue to be a special case. We can't encode it (to maintain compatibility with JS encodings) and I also didn't want to not return it from decode so that I could keep behavior identical to JS decode.

I do like your thought of not conflating encode/decode and presentation, so having a separate presentation layer could be clearer and more helpful. That's starting to sound like an ADL lol (IIUC).

ipld/ipld#151 are the fixtures I propose to help clarify this.

warpfork · 2021-11-16T00:53:58Z

dagjose/multicodec.go

+	// DAG-CBOR is a superset of DAG-JOSE and can be used to decode valid DAG-JOSE objects. Use DAG-CBOR decoding but do
+	// not allow IPLD Links. See: https://specs.ipld.io/block-layer/codecs/dag-jose.html
+	err := dagcbor.DecodeOptions{
+		AllowLinks: false,
+	}.Decode(joseBuilder, r)


I was going to say that this seems unnecessary (because if you're decoding it into something with a schema, that should already enforce this rule and more, right?)... but then remembered the "link" field, which has these special rules about how it should be synthesized. So, okay, maybe it makes sense to do this.

But it draws attention to the fact we seem to be using the types as both a decoding target, and as something we return to people to look at... despite those being logically not the same thing.

I can't tell if this is strange looking but fine in both spec and code, or code worth worrying about but still spec fine; or if this hints towards something in the spec that's weird enough that it makes me want to review the spec again.

@rvagg, I'm not sure you have any free time, but if you do and could have a look at how this link property is derived and if that makes sense to you, I'd kinda appreciate a doublecheck.

Yeah, I'm quite unsatisfied with the contortions I've had to make to match the behavior from the spec. I'm sure much of that is probably my inability to figure out how best to leverage code generation and the general Node assembly framework.

I just pushed what is likely to be the last significant code update to the main body of the codec that adds support for accepting "flattened" serialization. Wanted to put it up quickly (though not quite refined yet) so you and @rvagg can look at that too.

I have a feeling that some of the utility methods I've added to force the codec into submission might not pass muster 😅 Please let me know if there are better ways to do what I'm trying to achieve, mainly to re-assemble constructed nodes into a different shape (e.g. from a "flattened" node to a "general" node).

link property is derived like this:

take the bytes in the payload (this should be a CID)

encode them as a CID Link (in js land that's a CID instance)

put it at the link property

warpfork · 2021-11-16T01:08:26Z

dagjose/multicodec.go

+	// If the passed `NodeAssembler` is not of type `_DecodedJOSE__ReprBuilder`, create and use a
+	// `_DecodedJOSE__ReprBuilder`.
+	joseBuilder, alreadyJose := na.(*_DecodedJOSE__ReprBuilder)
+	if !alreadyJose {
+		joseBuilder = Type.DecodedJOSE__Repr.NewBuilder().(*_DecodedJOSE__ReprBuilder)
+	}


This successfully detects and shortcuts if someone hands in the builder form...

But not if they hand in the assembler, for whatever reason.

I had to double-check myself quick on how this works, and so came out with a gist: https://play.golang.org/p/zdvRuU3WY1g . Moral of the gist is: you can certainly make code that works on either type, but unfortunately when doing the assertions to see what was handed in, you might want to do two of them, so either builder or assembler form will both hit the fastpath.

(My apologies for this being a little confusing. A lot of this area was designed with "maximum fast" in mind, and the ease-of-use is correspondingly... well. I wish I could think of a way to make this simpler!)

Ahh, gotcha, will update, ty.

Hmm, looks like the coercion doesn't work with the way the builder/assembler are generated 😞

type _DecodedJOSE__ReprBuilder struct { _DecodedJOSE__ReprAssembler } type _DecodedJOSE__ReprAssembler struct { thing int // n.b. same name "thing", same int type }

./prog.go:37:41: cannot convert assembler (type *_DecodedJOSE__ReprAssembler) to type *_DecodedJOSE__ReprBuilder

versus the way it's in the example you sent:

type _DecodedJOSE__ReprBuilder struct { thing int } type _DecodedJOSE__ReprAssembler struct { thing int // n.b. same name "thing", same int type }

Or is there some other way to do this that I'm missing?

It works like this but I'm not sure if it like it:

builder := &_DecodedJOSE__ReprBuilder{*assembler} _ = builder

But, you know what? It might not be necessary to coerce the assembler into a builder.

The fastpath would be to pass either a JOSE builder or a JOSE assembler on to CBOR decode.

Could I do something like this?

copyRequired := false jwsBuilder, castOk := na.(*_DecodedJWS__ReprBuilder) if !castOk { // This could still be `_DecodedJWS__ReprAssembler`, so check for that. _, castOk := na.(*_DecodedJWS__ReprAssembler) if !castOk { // No fastpath possible, just create a new `_DecodedJWE__ReprBuilder`, use it, then copy the built node into // the assembler the caller passed in. jwsBuilder = Type.DecodedJWS__Repr.NewBuilder().(*_DecodedJWS__ReprBuilder) copyRequired = true } } if err := dagcbor.Decode(jwsBuilder, r); err != nil { return err }

* Introduce tests using fixtures from the ipld/ipld repo. * tweak some fixture hunk names. See commit message in the linked commit in the ipld/ipld submodule. (Calling both the dagjose and the dagjson hunks "data", as it was previously, suggested incorrect things: they are not that similar. The dagjson is the datamodel view, and that is *not* the same thing as a transliteration of the tokens that you'd get if you parsed the dag-jose as dag-cbor.)

…agjose-schema

warpfork · 2021-11-26T00:34:21Z

dagjose/spec_test.go

+						n, err := ipld.Decode(fixtureDataBinary, Decode)
+						if err != nil {
+							t.Fatalf("%s", err)
+						}


You still have a logically identical n from the parent scope -- you can just keep using it here and skip these lines

Ahh right, will do.

warpfork · 2021-11-26T00:36:27Z

dagjose/spec_test.go

+						encoder, err := linkSystem.EncoderChooser(dagJOSELink)
+						if err != nil {
+							t.Fatalf("could not choose an encoder: %v", err)
+						}
+						hasher, err := linkSystem.HasherChooser(dagJOSELink)
+						if err != nil {
+							t.Fatalf("could not choose a hasher: %v", err)
+						}
+						err = encoder(n, hasher)
+						if err != nil {
+							t.Fatalf("%s", err)
+						}
+						lnk := dagJOSELink.BuildLink(hasher.Sum(nil))
+						cidLink, ok := lnk.(cidlink.Link)
+						if !ok {
+							t.Fatalf("%s", err)
+						}


whoa whoa whoa. I'm pretty sure this is all just linkSystem.ComputeLink isn't it? You can do way less work than this :P

Hah, copy-pasta failure there. Fixed.

warpfork · 2021-11-26T00:37:30Z

dagjose/spec_test.go

+						if !ok {
+							t.Fatalf("%s", err)
+						}
+						fixtureCidString := strings.Replace(string(fixtureCid.Hunk.Body), "\n", "", -1)


Stylistic nit, but I'd tend to use strings.TrimSpace for this. Needs many fewer arguments.

Remove redundant code

smrz2001 marked this pull request as draft November 13, 2021 13:36

This was referenced Nov 13, 2021

Support dag-jose codec by default ipfs/kubo#8364

Closed

feat: dag-jose implementation using IPLD schema/code generation #22

Closed

feat(dagjose): add encoder and decoder methods #13

Closed

smrz2001 force-pushed the feat/dagjose-schema branch from 523609e to 4a8bcb0 Compare November 14, 2021 14:16

smrz2001 added the open mainnet label Nov 14, 2021

smrz2001 self-assigned this Nov 14, 2021

smrz2001 requested review from alexjg, decentralgabe, oed, stbrody and v-stickykeys November 14, 2021 17:21

smrz2001 marked this pull request as ready for review November 14, 2021 17:21

feat: dag-jose implementation using IPLD schema/code generation

f1fcd6d

smrz2001 force-pushed the feat/dagjose-schema branch from b120c5b to f1fcd6d Compare November 14, 2021 18:31

smrz2001 added 2 commits November 14, 2021 16:40

cleanup

b492074

support for "flattened" JOSE objects

f1e1565

warpfork reviewed Nov 16, 2021

View reviewed changes

smrz2001 added 7 commits November 16, 2021 14:30

updates from review

9a2c58a

comments and renaming

4f4cc32

validate (then ignore) link when encoding

8c43685

updates and test fixes

1e13125

first pass of unit tests

9e74eec

update dependencies

3176246

test fixes

074958e

smrz2001 mentioned this pull request Nov 19, 2021

Introduce tests using fixtures from the ipld/ipld repo. #24

Merged

8 tasks

warpfork and others added 2 commits November 18, 2021 21:56

Merge remote-tracking branch 'origin/feat/dagjose-schema' into feat/d…

b8b195c

…agjose-schema

smrz2001 added 10 commits November 18, 2021 23:09

test fixes

81244a6

test fixes

bfeb8fe

test fixes

67cf807

test fixes

837389d

cleanup docs

be1fa03

remove link from encoded JWS

cf9f2fd

fix up tests

7b15497

minor updates and add TODOs

b695464

add CID matching to spec tests

a8f96a0

remove travis.yml file and add TODOs to README.md

4bd8a79

smrz2001 mentioned this pull request Nov 25, 2021

Integrate go-dag-jose plugin ipfs/kubo#8569

Merged

7 tasks

warpfork reviewed Nov 26, 2021

View reviewed changes

smrz2001 added 5 commits November 25, 2021 19:38

Update spec_test.go

1893006

Remove redundant code

updates from review

411bf52

update to latest ipld repo

5573e40

use newest go-ipld-prime

062415b

check err from traversal

d17634a

smrz2001 merged commit 0029d76 into main Nov 26, 2021

smrz2001 deleted the feat/dagjose-schema branch November 26, 2021 01:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `dag-jose` implementation using IPLD schema/code generation #23

feat: `dag-jose` implementation using IPLD schema/code generation #23

smrz2001 commented Nov 13, 2021 •

edited

Loading

warpfork Nov 16, 2021

warpfork Nov 16, 2021

smrz2001 Nov 16, 2021

smrz2001 Nov 16, 2021

warpfork Nov 16, 2021

warpfork Nov 16, 2021 •

edited

Loading

smrz2001 Nov 16, 2021

smrz2001 Nov 16, 2021

warpfork Nov 19, 2021

warpfork Nov 16, 2021

warpfork Nov 16, 2021

smrz2001 Nov 16, 2021 •

edited

Loading

oed Nov 16, 2021

warpfork Nov 16, 2021

smrz2001 Nov 16, 2021

smrz2001 Nov 16, 2021 •

edited

Loading

smrz2001 Nov 16, 2021

smrz2001 Nov 16, 2021

warpfork Nov 26, 2021

smrz2001 Nov 26, 2021

warpfork Nov 26, 2021

smrz2001 Nov 26, 2021

warpfork Nov 26, 2021

feat: dag-jose implementation using IPLD schema/code generation #23

feat: dag-jose implementation using IPLD schema/code generation #23

Conversation

smrz2001 commented Nov 13, 2021 • edited Loading

dag-jose implementation using IPLD schema/code generation - #4

Description

How Has This Been Tested?

Definition of Done

TODOs (for subsequent PRs):

References:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

warpfork Nov 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smrz2001 Nov 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smrz2001 Nov 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feat: `dag-jose` implementation using IPLD schema/code generation #23

feat: `dag-jose` implementation using IPLD schema/code generation #23

smrz2001 commented Nov 13, 2021 •

edited

Loading

`dag-jose` implementation using IPLD schema/code generation - #4

warpfork Nov 16, 2021 •

edited

Loading

smrz2001 Nov 16, 2021 •

edited

Loading

smrz2001 Nov 16, 2021 •

edited

Loading