Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add tests to verify allowable data layouts #58

Merged
merged 1 commit into from
Oct 12, 2020
Merged

chore: add tests to verify allowable data layouts #58

merged 1 commit into from
Oct 12, 2020

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Sep 17, 2020

Trying to pin down a precise schema for DAG-PB.

Ref: ipld/specs#297

Relevant findings:

  1. Most things can be nil.
  2. Links:[] is not possible, zero-length Links becomes nil, this isn't the case for Name or Hash when they are zero length. This also means { Data: nil, Links: [] } === { Data: nil, Links: nil } === {}.
  3. Unmarshalling zero bytes is a valid operation and yields ^ that ({ Data: nil, Links: nil }). I'm not really sure what to do with that information but it's worth noting. I'm sure others already know this but perhaps this should go into the DAG-PB spec if it's allowable by other components of the IPLD stack. So there would be a canonical CID for an empty PB block for any given multihash. That would be bafybeihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku or QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n for SHA2-256. And yep, the latter in Google gives us patch add-link fails to write result to blockstore kubo#1925. That issue also has a discussion where it's stated that zero-length byte arrays should be encoded as nil when marshalling, but it looks like that never happened here at least.

Some of this is at odds with js-ipld-dag-pb, but I'll take that discussion to ipld/specs#297, nothing needs to change here.

@welcome
Copy link

welcome bot commented Sep 17, 2020

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions
    and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the
    rest) and in its best form. Follow the code contribution
    guidelines

    if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on
    any missing things and potentially assigning a reviewer for high
    priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

@rvagg
Copy link
Member Author

rvagg commented Sep 17, 2020

/cc @mvdan for review

pb/forms_test.go Outdated Show resolved Hide resolved
pb/forms_test.go Outdated Show resolved Hide resolved
@rvagg
Copy link
Member Author

rvagg commented Sep 21, 2020

OK, I figured out what to do with this, see summary in ipld/specs#297 (comment)

The important difference is that I've been thinking of nil as Null in the Data Model, which it isn't. So I'm treating nil here as omitted/absent in the Data Model when we get up to ipld-prime.

I've also ditched my custom printer and gone with json.MarshalIndent but I've had to make my own map[string]interface{}s to get it to print what how I want because of the some of the json tags that already exist in here and to be able to convert a few things to be how I need them. But it gives me forms that translate precisely to JS. They're mirrored here: https://github.com/ipld/js-dag-pb/blob/master/test/test-compat.js

There are two differences in the tests here and in JS where we do proper Data Model form checks:

  1. The case of empty Links array is indistinguishable from an omitted one, trying to make an empty array is an error in JS
  2. JS will check that the bytes of a Hash actually constitute a valid CID, this one doesn't.

We should do both of those things in go-ipld-prime-proto.

@mvdan ptal at my Go

Copy link
Contributor

@mvdan mvdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Using a map and then json sounds fine to me. You still have a bit of code, but there are less footguns when it comes to getting the encoding wrong. And the map gives you the "nil vs absent" difference that you want, I assume.

At the Go level, I left some comments below. They're mostly about making the tests a bit nicer to use and maintain, not really about what they do. The only other point that comes to mind is that each of the tests is just running verifyRoundTrip on different input/output, so you would save a lot of boilerplate by using table-driven tests: https://github.com/golang/go/wiki/TableDrivenTests. Up to you if it's worth it at this point, though.

pb/forms_test.go Outdated Show resolved Hide resolved
pb/forms_test.go Outdated Show resolved Hide resolved
pb/forms_test.go Outdated Show resolved Hide resolved
pb/forms_test.go Outdated Show resolved Hide resolved
pb/forms_test.go Outdated Show resolved Hide resolved
pb/forms_test.go Outdated Show resolved Hide resolved
@mvdan
Copy link
Contributor

mvdan commented Sep 24, 2020

I assume my approval isn't counting because I don't have enough permissions in the ipfs org. I'm not sure if that matters :)

@rvagg
Copy link
Member Author

rvagg commented Sep 25, 2020

Thanks for the link to TableDrivenTests -- I went googling "data driven testing golang" for this one because I felt the need. The problem I'm having is that some of these pieces of data can't be created as a top level var because they require more than one step (l := PBLink{}; n := PBNode{Links:[&l]} kind of thing). What's the recommendation for dealing with that situation because there's a lot of it in here?

@mvdan
Copy link
Contributor

mvdan commented Sep 25, 2020

Hm, I'm pretty sure you don't need multiple steps there; you could just do:

input: PBNode{Links: []*PBLink{
    {}, // note that you don't need to repeat the type here, thanks to the []*PBLink above
}},

If you really did need multiple steps, such as when creating a cycle, you could use a constructor func:

newInput: func() *PBNode {
    foo := &PBNode{...}
    bar := &PBNode{..., loop: foo}
    foo.loop = bar
    return foo
}

@rvagg
Copy link
Member Author

rvagg commented Sep 26, 2020

OK, I've got it sorted now. It's all the pointers that get in the way of pulling these things into a top-level variable. But I've solved that by also making the things that become pointers top level variables so I can & them by name. That seems to work well and I think it might add some clarity too because most of them are repeated and the names helps with identifying what they're for.

t.Logf() is a good addition too! (TIL).

How does it look now @mvdan?

@mvdan
Copy link
Contributor

mvdan commented Sep 28, 2020

Yeah, you can't do &"foo" in Go, which is perhaps unfortunate. Your use of globals seems fine to me. The other common alternative, especially for non-static data, is something like:

func strPtr(s string) *string { return &s }

var t = T{Pointer: strPtr("foo")}

Copy link
Contributor

@mvdan mvdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@rvagg
Copy link
Member Author

rvagg commented Sep 30, 2020

@aschmahmann I suspect you might be the one to ping about getting this merged now? This doesn't change the implementation, just adds tests that explore the edges of the forms for the protobuf data. These are mirrored in the new dag-pb JavaScript implementation @ https://github.com/ipld/js-dag-pb/blob/master/test/test-compat.js and a new Go implementation currently @ https://github.com/rvagg/go-dagpb/blob/master/compat_test.go

@rvagg rvagg merged commit 8f475e5 into ipfs:master Oct 12, 2020
@rvagg
Copy link
Member Author

rvagg commented Oct 12, 2020

discovered I have commit access and can just merge this myself

@rvagg rvagg deleted the rvagg/dag-pb-forms branch October 12, 2020 06:29
@aschmahmann aschmahmann mentioned this pull request Aug 23, 2021
62 tasks
Jorropo pushed a commit to ipfs/boxo that referenced this pull request Mar 15, 2023
chore: add tests to verify allowable data layouts

This commit was moved from ipfs/go-merkledag@8f475e5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants