-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canonical form #5
Comments
In my opinion it might be worth considering making them invalid and thus preventing their usage. As those varints won't be user input, it means that they can be created only by software and accepting those can lead to software that produces them even if they are discouraged. |
I do know that: o1 = <user-object>
d1 = protobuf.EncodeWithSuperfluousZeros(o1)
o2 = protobuf.Encode(d1)
d2 = protobuf.Decode(o2)
bytes.Equal(d2, d2) # may be false !!! and this is definitely wonky. but i haven't thought through alternatives to know that similar or worse wonkiness isn't to be found on the other end. have you? what makes you absolutely certain that no edgecase wonkiness like this is at the other side? |
I haven't found any software currently that does the case (b) but it doesn't include a chance that someone creates some software. I am unsure about implications of something like that apart from infinity many addresses to exactly the same object, possible increased storage size if ever some software produces such address in link, as de-duplication of linking objects won't work. |
In the multiformats/multicodec#16 @diasdavid wanted to define SHA1 as: The issue itself isn't really about protobufs but varints inside multiformats suite. |
I changed varint in golang/protobuf so now may not be the same values.
|
If different unsigned-varints can possibly decode to the same natural number, the specification should address how unsigned-varints are compared for equality. Without that, developers may be tempted to compare at the byte-level, and there could be multiple (and possibly conflicting) identifiers for the same thing. A canonical encoding would make byte-level comparisons acceptable. To clarify what I (and I think @Kubuxu) mean by canonical encoding, given the functions
for all
However, some bytes may be invalid and not map to a
Parsers tend to be flexible in what they consume, even accepting undesirable encodings. This strategy can be beneficial in situations where the intent of the encoded information won't be misconstrued. However, in this case where unsigned-varints are intended to signify identity it's especially important to have a cohesive explicit strategy in the spec. If undesirable unsigned-varint byte arrays (e.g.
|
This has been fixed. |
Base 128 varints, aka Protobuf Varints, in theory can be expanded to more than one byte even if there is no need for it.
Example:
0000 0001
is oneand
1000 0000, 0000 0001
is also a one.We should decide what to do with them, either allow them to be stored and used or just accept them but canonize them on a fly or just disallow them to prevent hash ambiguity.
The text was updated successfully, but these errors were encountered: