-
Notifications
You must be signed in to change notification settings - Fork 108
dag-pb: more accurate schema and up-to-date impl description #297
Conversation
Also pushed in an update that adds the sorting order to |
OK, did some investigation with Go to see what it's doing at the lowest level and that resulted in ipfs/go-merkledag#58 and some winding back of most of the proposals in here. The only real one that changes the schema now is:
Plus I've added some notes below with further constraints that can't be expressed with the schema. Particularly that I was going to use go-ipld-prime-proto as a reference but noticed this in the README:
It really is minimal and defers mostly to go-merkledag and has some very rough rules about what it's doing with the data. So what we define here should be the rules used when go-ipld-prime-proto is properly fleshed out so it's worth getting this right. I've also added a section about a zero-length block being valid. Both current Go and JS implementations will decode this into |
ok! I've had my head in this all week and I'm much more comfortable with where this is at now. I've reversed a bit from the previous comment, the schema uses
Implementation stuff:
|
OK, after sleeping on it and then talking it through this morning in the meeting and with Mikeal I've decided to even wind back the I've also added that a block with "unrecognized" data should also be rejected. This is a should because most PB decoders will silently accept extraneous data. In go-merkledag there's a I'm about to go enforce this on the new JS DAG-PB library. Maybe we'll find some source of data out there that has these bad blocks, but until we see them in the wild we can afford to be strict in our newer implementations. |
block-layer/codecs/dag-pb.md
Outdated
Name optional String | ||
Tsize optional Int | ||
} | ||
|
||
type PBNode struct { | ||
Links optional [PBLink] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly that the protobuf wire format doesn't disambiguate between a lack of a list here vs a zero length list... maybe it would be best to describe it in our schemas as Links [PBLink]
without the optional modifer on the field?
If we have the optional
modifier on the field, that means it adds Absent
to the cardinality (it +1's the cardinality), but we really have no way of taking away the empty list as a valid value (if we could, that'd be -1 on the cardinality, and balance things -- but we don't have such an ability).
Just having the codec handle the translation from {empty list}<->{absense of encoded bytes} seems correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so I've been wanting to lean this way but the thing that's stopped me is that it creates a lot of situations where you're forcing the use (and allocation) of an empty list where you don't want anything. For decoding a dag-pb block with just data in it you're going to need to throw in an extra array just to satisfy this case. When creating an object manually in our APIs you're going to have to allocate an array. In JavaScript this means that I have to force users to give me { Data: [...], Links: [] }
when they're just wanting to ingest terminal data blocks. This isn't a huge inconvenience as I've created this prepare()
method to help get the forms right before instantiating, but it adds a level of hostility that I'm trying to avoid. I think in ipld-prime you'd have data structures that have an empty array for Links
when you create a new PBNode
anyway so I suppose you get around this hostility fairly easily.
But.. I take your point on cardinality. And the reason why I've been tempted in this direction is that there's something to be said on the other end of this API where you always get an array in that position and don't need to check for its existence. If I can write a traversal that safely assumes Links
is always there then it's much nicer than the incessant checking.
I'm going to ponder this today and maybe try out some code.
OK, I'm going with @warpfork's argument about I've done another pass of the doc and cleaned up the clarifications to the schema. I've also added notes in the Protobuf schema about our recommendations for strict decoding. https://github.com/ipld/js-dag-pb now matches this. I'm going to work on a Go form too I think now that I have a good test suite but it'll just be a simple replacement to go-merkledag/pb, I don't think I can manage the ipld-prime piece of it and the question of whether to integrate it in ipld-prime-proto is still up for discussion. So I'll just go with the low-level pieces for now. |
1d1e29d
to
f5228fb
Compare
f5228fb
to
93ffb70
Compare
Round 1,490 of DAG-PB specification cleanup..
I'm working on getting this just right in https://github.com/ipld/js-dag-pb/ and am taking the opportunity to be very strict about Data Model representation. We have the challenge of wanting to ensure that we have previously accepted some flexibility in the way the components of a DAG-PB were constructed (friendly API) but now we want to be much more strict - you must give us an object of exactly the right shape as defined by the Schema such that if we round-trip it it'll come out the same way (while also offering utilities on the side to regain some of the friendly API functionality).
Thanks to Hannah's efforts in https://github.com/ipld/go-ipld-prime-proto we have a good chance at full hamonisation of the two implementations! go-ipld-prime-proto has a much more strict interpretation of the Data Model form than the previous Go implementation (which offers the shortcut named pathing form). So I want to try and get that impl and the new JS impl aligned and the Schema here reflecting it correctly. I don't think we quite got it right the last time around, which was hampered by having to conform to the varying legacy implementations and hand-waving about Data Model form. I think we can be much clearer now that we have a clearer concept of the Data Model in the new implementations.
I'm looking mainly at https://github.com/ipld/go-ipld-prime-proto/blob/master/coding.go and its use of https://github.com/ipfs/go-merkledag/tree/master/pb to pull in properties to inform some of the Schema changes, I'd appreciate double-checking on that please. I plan to conform JS to this.
Some additional notes inline.