-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codec/dagcbor: add DecodeOptions.ExperimentalDeterminism #390
Conversation
We did apply #386 by default and forcefully, but I imagine that one is extremely unlikely to break any users, whereas this one isn't. |
The bad input is |
Yeah, I don't think we could do this by default. In general, Postel's principle seems to still be a good rule of thumb for us to follow, even in IPLD in contexts where we do care about canonicality more than most folks do.
(Edited: to put the more general comments first, for ease of readability as future "case law".) In DAG-CBOR in practice in particular: There's way too much data in the wild with noncanonical orders of map keys. (The genesis block of filecoin is one of them... not that that particular example should have overwhelming weight -- it's most certainly one of many -- but it's certainly amusing.) Noncanonical order is also extra-likely-in-practice for DAG-CBOR just because of the bizarreness of the CBOR RFC's choice of sorting algorithms; it catches people off-guard all the time. That contributes significantly to the amount of data in the wild with noncanonical order, as well as the odds of new such data appearing in the future when other people implement new libraries and systems. We have also already found other places where a noncanonical serial data in the wild became sufficiently common that we ended up deciding to accept it, for backwards compatibility's sake: for example, the CBOR undef byte, which we now quietly coerce to a null in the data model. (Decisions in: go; js; and even older go.) With any such secession, the horses have fully bolted the barn already, and there's very little utility on trying to get some of them back in. If we were looking at a codec that we had fully specified from scratch, and from its first day of being it had a single canonical form and all of its implementations (and any not-quite implementations, e.g. the relationship CBOR has to DAG-CBOR) were also strict from their first day... we'd be able to consider aggressively defending that ground. But unfortunately, that's not the scenario we have with DAG-CBOR. (Nor with DAG-JSON, if another similar example is useful to compare with.) |
(edited previous comment, to make it more legible as general "case law" for future reference.) |
FWIW I fully expected this default to be rejected, hence the failing tests and the "DO NOT MERGE". Tomorrow I'll update the PR to keep the strict mode opt-in on decoding, and update the fuzzer to not require the codecs to follow their own specs by default. One open question is what should the option be: "enforce spec strictness on decode" to do everything at once (map key order, canonical form for floats and integers, etc), or one option for each one of those separately, just like |
Also, I know we've talked about this before, but: I think it's time to reconsider what the specs dictate. It's not good to have the dag-cbor core spec dictate strictness on decode when all current implementations do the opposite. The spec should be updated to reflect reality, given that it's not possible to update the defaults to reflect the spec. |
@mvdan : will create the spec update PR? |
2c:
|
I've rebased this and updated the change to keep the same default and make the option a general "be strict" boolean. PTAL :) |
Filed ipld/ipld#196 so we don't forget. I haven't sent a PR as it's not a trivial change, and it's also not clear to me e.g. whether we want to apply the same thinking to other codecs. |
I've now got mixed feelings about this because it's so incomplete for something called "strict determinism". Reviewing the list from the top of my head to see how deep this can go:
Then there's all the little "you can't put that in there!" items, some of which refmt I think might reject already but we'd need to check:
Most of this reaches down into refmt.. but you get the idea that having something called "strict determinism" that only does map key ordering and nothing else is a bit weak. |
This is definitely incomplete, but a first step adding the API. I see two options:
2 is the most correct option, as you point out, but I also worry that it might be one of those things in go-ipld-prime that doesn't quite get finished in a long time, for reasons like having to reach into refmt. So my personal take is that having something as an end user is better than nothing :) Perhaps a middle ground is to call it |
|
This is simialr to Rod Vagg's EncodeOptions.MapSortMode, which taught the encoder to canonically sort map keys as per the spec, but now we're doing the decode side. We add a single "strictness" knob, as it needs to deal with multiple bits like integers and floats, and not just map key order. Update the fuzzer to be aware of this setting.
Done, PTAL |
I didn't add a link to this thread because the TODO already links to #389 |
(see commit message)