Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

DagCBOR: be more precise about foreign tags #227

Open
vmx opened this issue Nov 28, 2019 · 10 comments
Open

DagCBOR: be more precise about foreign tags #227

vmx opened this issue Nov 28, 2019 · 10 comments

Comments

@vmx
Copy link
Member

vmx commented Nov 28, 2019

In the DagCBOR spec, be more precise on what happens if you parse CBOR and it contains tags.

  • Should they be ignored?
  • Should they be preserved?
  • Should there be an error?

I suggest that we just ignore those tags and when you write DagCBOR, the only tag you ever write will be the one for CID (tag 42).

@rvagg
Copy link
Member

rvagg commented Nov 28, 2019

+1 on your suggested approach

@warpfork
Copy link
Contributor

I'd say there should be an error by default, and a "parseLoosely" method as an option.

Preserving the extra data without a way to read or to mutate it doesn't make for a pleasant user experience (nor an efficient library -- btw, cbor tags are technically allowed to contain other tags, and if you try to implement this you will be very very sad at how many allocs this can force you to make if you actually support it).

Ignoring the extra data means we're not really validating that it's DagCBOR. This is potentially dangerous. (Think: how the comment field in PDFs made it much, much easier to create hash collisions in (already weak) hashes. Even if it's not immediately broken, it's wading out towards the sharks in a very unnecessary way.)

Erroring explicitly is the choice that remains standing.


We can still also do the "parseLoosely" features in libraries in practice. This would buy us is being able to load CBOR documents that aren't exactly DagCBOR and manipulate them. But I don't think the specs themselves should hedge on this. And users should need to very much opt into such a usage: the number of cases where a user will want this, versus the number of cases where a user would be surprised how much bending-over-backwards the library is doing to support features they didn't need/want, is very leaning towards the latter, I'd bet.

@rvagg
Copy link
Member

rvagg commented Jan 1, 2020

To connect to other discussions happening elsewhere (no link, I thought it was in a team meeting but perhaps in a private discussion), there's also other parts of CBOR which can be "loose", e.g. the number 255 can be one of 18FF, 1900FF, 1A000000FF, 1B00000000000000FF. So while we're at considering what to do with tags outside of 42 we should also be considering whether we want there to be only one way of encoding data with CBOR for valid DagCBOR and perhaps our parsers all need to have a "strict" mode so that we can enforce that.

I think the remaining discussion on this is regarding how much it's worth the effort compared to everything else we're trying to do? It obviously has dedupe benefits but I'm still a little sceptical that the other reasons for doing this raise above the level of "it's just cleaner!".

/cc @ribasushi

@ribasushi
Copy link
Contributor

@rvagg actually deduplication is not the primary reason for wanting this. It is a part of maintaining the symmetry of "this cryptographic id represents this thing, and this thing is only represented by this cryptographic id" ( theoretical limits notwithstanding )

Yes, it is trivial to add a single padding byte somewhere and change every id. But it should not be possible to express bit-for-bit identical content under multiple cryptographic IDs. The simplest use-case I can think of are takedown requests.

Again it comes back to what is actually being built: a product or a platform. If the latter - I strongly believe any low-hanging fruit allowing disambiguation should be baked into the lowest levels of the fabric of said platform.

@rvagg
Copy link
Member

rvagg commented Jan 20, 2020

deterministic map ordering is in this rough category too, discussed in https://github.com/ipld/team-mgmt/blob/8d612e3ce1222de9dd531ca14bdeeb662bd2a3a1/meeting-notes/2020/2020-01-20--ipld-sync.md#notes, we need to make sure our codecs document rules for these while we're documenting these other "strict" encoding/decoding details.

@warpfork
Copy link
Contributor

Sounds like we should prepare to have a Data Model Spec Sprint Day or something...? I think we're starting to accumulate a decent number of these things and we should get em all on an agenda and knock em out.

@mikeal
Copy link
Contributor

mikeal commented Jan 22, 2020

ya, we need another close-a-thon in the specs repo in general, but it would probably be most productive to have at least one session focused on Data Model only.

@vmx
Copy link
Member Author

vmx commented Jan 22, 2020

@mikeal it's more a "getting-a-lot-done" rather then closing. Most open things are valid things that need some work (in the previous round we were able to close old/out of scope things).

@jonnycrunch
Copy link

so, just to give my thoughts on this subject.

In my use-case for the IPID DID spec that uses IPLD, it would be very nice to support tag 98 (COSE signing) of CID as tag 42. This provides me with all of the semantics and tooling necessary to accomplish my goal natively in CBOR and IPLD. I'm sure this could also be a method to help with data integrity for DAG sync. Also, @hsanjuan might find this approach interesting for shared clustering.

98(
  [
    / protected / h'a10300' / {
        \ content type \ 3:0
      } / , 
    / unprotected / {}, 
    / payload /  42: 'bafyreibtkfbzqiwdhd5bv53wrag43jbz4ytyixigde6wa5zuddxciq7s2m', 
    / signatures / [
      [
        / protected / h'a10126' / {
            \ alg \ 1:-7                    \\  alg = ECDSA 256 \\
          } / , 
        / unprotected / {
          / kid / 4:'did:ipid:12D3KooWDXDEZtLW5k16DjeHWkFZEeTKUF7PnzzY7m2UocmduhZV;/publicKey/0'
        }, 
        / signature / h'e2aeafd40d69d19dfe6e52077c5d7ff4e408282cbefb5d06cbf414af2e19d982ac45ac98b8544c908b4507de1e90b717c3d34816fe926a2b98f53afd2fa0f30a'
      ]
    ]
  ]
)

@rvagg
Copy link
Member

rvagg commented Jan 23, 2020

#236

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants