Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(codecs): coerce cid.Undef to null in dag{json,cbor} output #433

Merged
merged 1 commit into from
Jun 1, 2022

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented May 24, 2022

Otherwise we end up with invalid output like this:

dag-cbor: 0xd82a4100

  d8 2a                                           #   tag(42)
    41                                            #     bytes(1)
      00                                          #       "\x00"

dag-json: {"/":"b"}

Fixes: #246


@aschmahmann @willscott I'd appreciate your 👍 on this as it's a coercion rather than an error or other funky behaviour. But this probably isn't that simple:

  • Unlike string, or int, the zero-value of CID doesn't have a round-trippable form that we've come up with for any of our codecs, so we're essentially making one up here, null. The alternative may be to define a null CID for both of these codecs - we could just use the values above and make it work across our current codec implementations (currently in JS, both of these result in errors because it can't decode a varint where it expects, IIRC round-tripping in Go does essentially the same thing).
  • For typed nodes, this coercion could have blow-back if you have a link field that's not nullable or optional, then a round-trip with a cid.Undef is going to result in a null in that position which might be a problem. But we could push that problem up the stack and make it a concern for bindnode and friends to figure out. They could even use cid.Undef for non-nullable non-optional fields that come out as null.
  • We have the option of erroring on this if we want - make it a user problem, don't give us cid.Undef, it's not allowed.

(Related discussion in #191 and the currently rejected PR that would error on Absent fields: #196).

@Stebalien @vmx you might want to weigh in too because we're talking possible codec behaviour changes.

@rvagg rvagg self-assigned this May 24, 2022
@vmx
Copy link
Member

vmx commented May 24, 2022

I'm not sure I've grasp the full impact of this. But serializing a cid.Undef to null makes sense to me.

@rvagg
Copy link
Member Author

rvagg commented May 24, 2022

I'm thinking through what I might do in JS if we made it OK to have these zero-CIDs in the codecs, and I don't think I have the ability to round-trip a null CID, we don't have (and haven't needed) an equivalent of Undef in JS, null works just fine. So if we got a 0xd82a4100 and treated that as valid, the user would get a null and we'd lose the ability to round-trip. Unless we introduced our own form of Undef in JS, which would probably be a little chaotic.

@vmx Undef is normally used as the "unset" / zero form of a CID when it's not a pointer, so can't be nil. The equivalent of "" for the zero form of a string or 0 for an int. We currently have a gap in the implementation that it will try and serialise it but it's really just an empty byte array, which isn't a CID at all. So we have to do something.

@rvagg
Copy link
Member Author

rvagg commented May 24, 2022

Here's a case for just erroring—the old go-ipld-cbor will if it gets an Undef: https://github.com/ipfs/go-ipld-cbor/blob/3c41942dd2857bcf82bea83160439ef8f1cac5c9/node.go#L541

@vmx
Copy link
Member

vmx commented May 24, 2022

@rvagg I'm not sure I'm missing something here. I don't understand the relation to codecs. It seems to be a Go specific issue. I wouldn't expect codecs to support/round-trip invalid CIDs and I also think we should try as hard as possible to not leak language specific things into codecs.

@aschmahmann
Copy link

@rvagg what's the motivation to not error? IIUC the older libraries error here and it's not clear why anyone would intentionally want to encode a null CID rather than that being an application bug.

Is there a bunch of this malformed data running around in the wild that we'd want to decode without erroring?

@willscott
Copy link
Member

I also have mixed feelings on this one. Having errors when nil/undef cids is a good indication that you should be checking and converting that case into the null case of a nullable field. That said, it is boilerplate that gets added in at all those places and does seem right that we can just automatically interpret that to the null case when possible.

@rvagg
Copy link
Member Author

rvagg commented May 25, 2022

OK, no appetite for changing codec specs to allow some form of zero-CID, makes this a bit easier.

Error vs coerce: I coming down on the side of erroring - it may lead to boilerplate, but if you end up with an undefined CID maybe it should have been a pointer and therefore be optional or nullable, in which case this would be taken care of without the need for boilerplate. Plus, is it likely that if you're allowing an undefined CID to get this low down in the stack that maybe you've missed adding some checking much much further up the stack?

Erroring does give us some good symmetry as well, there's no magic or surprises and ensures round-trip stability. Make your input the same as what you should get on the output, and we're not making undefined CIDs anywhere on the way out, it'll just be a null Node.

So latest commit makes it an error; reviews or further discussion?

Otherwise we end up with invalid output.

dag-cbor: 0xd82a4100

  d8 2a                                           #   tag(42)
    41                                            #     bytes(1)
      00                                          #       "\x00"

dag-json: {"/":"b"}

Fixes: #246
@rvagg rvagg merged commit face7e9 into master Jun 1, 2022
@rvagg rvagg deleted the rvagg/codec-undef-cid branch June 1, 2022 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🎉 Done
Development

Successfully merging this pull request may close these issues.

Cid.Undef IPLD de-serialization error
4 participants