feat: representation of full uint64 range in & out of cbor #413

rvagg · 2022-05-12T11:10:17Z

I don't think I'd dare do this without Eric's OK so it's a draft for now. This should allow us to access the full positive uint64 range of integers, although you can only get to it by type checking and accessing some custom methods.

I'm mainly thinking about this for bindnode, when you have a uint64 you can actually encode and decode the full range rather than overflowing the signed bit when casting to int64 for the highest range of integers. The awkward cast to the uintNode shaped type wouldn't be a big deal tucked away in bindnode and it's not really something we have to advertise publicly unless folks complain and want their full range.

This doesn't handle the negative uint64 range, which cbor can do, that would require some plumbing through refmt which currently errors when it encounters a negint below int64 range.

rvagg · 2022-05-12T12:37:42Z

A better approach might be to leave plainInt unmolested and make a new plainUint that does what we need. It could even be reserved for the overflow cases only in the codec, with ints within the int64 range just getting an AssignInt(). So this is just a special case that can be sniffed for where it's really needed.

The existing TODO would still be a concern though, calling AsInt() will be an overflow. So maybe it should return an error in that case. Silently overflowing is probably not a good feature to keep baked in.

rvagg · 2022-05-13T01:22:00Z

As per @hannahhoward's suggestion, I added a public interface like LargeBytes, called UintNode, with an AsUint() method. I've also refactored as per my last comment and I've put plainInt back to how it was (a simple boxed int64) but given it UintNode support. I've also added a uintNode type that implements the full uint path, with an internal uint64 and a positive bool. I've also used this type exclusively for the >int64 case, so a dagcbor decode will mostly still use a simple AssignInt() but above that range will AssignNode(NewUint(v, pos)). So in terms of perf, the main overhead now is the type check for UintNode on marshalling integers; otherwise existing code paths are mostly the same; we're just adding additional functionality.

I also did dagjson, but had to shunt that to a branch because I discovered refmt's json handling doesn't deal with TUint token types. This should be a relatively straightforward fix in refmt, but also a low priority I think. https://github.com/ipld/go-ipld-prime/tree/rvagg/uint64-dagjson.

rvagg · 2022-05-30T05:25:04Z

I'm considering removing the possibility of negative uints from this - CBOR enables this, you can encode a negative uint and get it out safely, but of course at the language level we need a second signal to say that this value should be negative, and it's a little bit weird. If you're dealing with negative numbers and need to resort to this kind of thing, then you're probably already doing this at an application level and are unlikely to need it at the encoding level I suspect. You probably either already have a "is positive" boolean hanging around (and are likely even going to encode it in your data structure adjacent to the uint value), or it's always known to be negative and therefore no signal is needed.

In most languages we're used to not thinking about negative uints, we work around that in other ways (like just using signed ints).

hannahhoward · 2022-05-30T06:20:14Z

My primary comment was going to be about the support of negative uint64s, so I guess +1 to removing them. It seems odd.

Especially given we use refmt -- as I read the code, we don't actually even have access to the underlying CBOR in the marshal/unmarshal code, just the token stream, which also appears, by my read, to not really support the negative 2^64 range.

hannahhoward · 2022-05-30T06:21:18Z

finally I'm pretty sure cbor-gen doesn't support 2^64 negative ints, and compatibility with cbor-gen is the primary thing driving our use case here anyway.

rvagg · 2022-05-31T06:53:31Z

I removed the positive bool from this, so it can only work on positive uints. One side effect of this is that a plain int node can't act as a UintNode—it could, but it would error every time it's negative, which becomes a problem in the codec which first needs to check for UintNode, then call AsUnit() and then for negative integers it has to deal with the error case; which is a bit too inefficient.

And now I have mixed feelings about this—it's not as universal as I hoped this would be, I can't treat any integer as a uint and just use the sign to cover the full range. There's now a much clearer bifurcation between integer types, even when you're dealing with a positive int that's within the uint64 range.

There remains the option of interpreting every positive integer out of the codec into a UintNode, but I don't know if that buys us much.

rvagg · 2022-06-08T04:41:26Z

@hannahhoward @willscott want to give a (hopefully final?) review here?

I've made a trade-off choice here for decoding: we AssignNode(UintNode{v}) only when we encounter integers >MaxInt64. And the Int nodes that are within int64 range don't get to conform to the UintNode interface. One reason for this is that we get to take advantage of direct AssignInt() calls on builders for most positive integers, which should be more efficient in most cases. But it leads to asymmetries and quicks, but we have to pick a side on this:

only a the top part of the uint64 range will be able to act as a UintNode
something can be a Kind_Int but AsInt() errors
you can use a type check for UintNode to see whether these top-end values are there and available

An alternative is that a Int node implementer, such as basicnode.plainInt, could conform to the UintNode interface and error on AsUint(), but you'd lose the ability to use the type check to see if it's available, you have to call it an error instead.

So we get this set of conditions if we went the other way:

all integer nodes (from basicnode anyway) get to act as a UintNode (simple change to basicnode.plainInt).
something can be a UintNode but AsUint() errors
you can't use a type check for UintNode to see whether these top-end values are there and available, you have to check error on AsUint()

For now, I'm considering this mostly an internal concern, for bindnode, so keeping current code paths mostly unmolested is ideal. For other cases you're going to encounter errors out of range, at some point. e.g. in a codegen TypedPrototype build, the AssignNode(UintNode{}) is going to find its way to an AsInt() call somewhere which should error as being out of range (which is better than the current state because it overflows silently).

rvagg · 2022-06-10T07:50:44Z

Too many branches stacking up on each other and this is at the bottom of them so I'm going to pull the trigger and merge 🤞

rvagg force-pushed the rvagg/uint64 branch from ee8733f to 1a2ecef Compare May 13, 2022 01:23

rvagg force-pushed the rvagg/uint64 branch from 1a2ecef to 6c64f2d Compare May 24, 2022 01:03

rvagg mentioned this pull request May 24, 2022

feat(bindnode): support full uint64 range #429

Closed

rvagg marked this pull request as ready for review May 31, 2022 06:49

BigLep assigned rvagg Jun 7, 2022

BigLep requested a review from hannahhoward June 7, 2022 22:10

rvagg force-pushed the rvagg/uint64 branch from 166435d to 5585719 Compare June 8, 2022 03:35

rvagg requested a review from willscott June 8, 2022 04:41

rvagg force-pushed the rvagg/uint64 branch from e1c3911 to 0d20396 Compare June 8, 2022 04:42

feat: introduce UIntNode interface, used within DAG-CBOR codec

464a118

rvagg force-pushed the rvagg/uint64 branch from 0d20396 to 464a118 Compare June 8, 2022 04:42

willscott approved these changes Jun 8, 2022

View reviewed changes

rvagg merged commit e424b72 into master Jun 10, 2022

rvagg deleted the rvagg/uint64 branch June 10, 2022 07:50

aschmahmann mentioned this pull request Jun 29, 2022

spec: initial WAC spec ipld/ipld#226

Open

rvagg mentioned this pull request Nov 28, 2023

feat: add car and car generator ipld/codec-fixtures#102

Merged

lidel mentioned this pull request Nov 29, 2023

dag import: errors with IPLD test vectors ipfs/kubo#10236

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: representation of full uint64 range in & out of cbor #413

feat: representation of full uint64 range in & out of cbor #413

rvagg commented May 12, 2022

rvagg commented May 12, 2022

rvagg commented May 13, 2022

rvagg commented May 30, 2022

hannahhoward commented May 30, 2022

hannahhoward commented May 30, 2022

rvagg commented May 31, 2022

rvagg commented Jun 8, 2022

rvagg commented Jun 10, 2022

feat: representation of full uint64 range in & out of cbor #413

feat: representation of full uint64 range in & out of cbor #413

Conversation

rvagg commented May 12, 2022

rvagg commented May 12, 2022

rvagg commented May 13, 2022

rvagg commented May 30, 2022

hannahhoward commented May 30, 2022

hannahhoward commented May 30, 2022

rvagg commented May 31, 2022

rvagg commented Jun 8, 2022

rvagg commented Jun 10, 2022