Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base 2, base 8, and base 10 #59

Open
Stebalien opened this issue Aug 1, 2019 · 10 comments
Open

Base 2, base 8, and base 10 #59

Stebalien opened this issue Aug 1, 2019 · 10 comments
Assignees

Comments

@Stebalien
Copy link
Member

The odd-balls in the current multibase spec are:

  • Base 2
  • Base 8
  • Base 10

That is, these are generally considered less useful than the other bases. The current situation is:

  • Base 2 is useful for bitfields.
  • One of base 8 or base 10 may be useful when only digits (0-9) are allowed.
    • Base 10 has a spec.
    • Base 10 is a more compact.
    • Base 8 may be simpler to decode/encode.

The question is: which of these should we keep, if any? This is relevant to multiformats/go-multibase#26 as, if we keep base 8, we need to define and implement it.

@vmx
Copy link
Member

vmx commented Aug 2, 2019

I'm in favour of reducing the burden on implementers. If it turns out that there's a base encoding that isn't part of the spec yet, we can add it later on. I'm for starting with a valuable small set of things and expand if needed (which might never be needed).

@richardschneider
Copy link
Contributor

@vmx, as a C# implementor, I unilaterally decided not to implement these bases. See richardschneider/net-ipfs-core#54

@fabianhjr
Copy link
Contributor

fabianhjr commented Oct 2, 2019

Base8 can encode/decode more efficiently. (Computationally efficient for large data)
Base10 uses less space but is more expensive to encode/decode. (Space efficient)

I would say both should be kept and a Base8 spec added.

@fabianhjr fabianhjr mentioned this issue Oct 2, 2019
@ribasushi
Copy link
Contributor

ribasushi commented Jun 3, 2020

Note for those following along. While go-multibase never gained a base8 encoding implementation, js-multibase is about to get fully-baked support for this. Notably from multiformats/js-multibase#55 (comment)

Note: base8 deviates from the spec tests outputs but aligns with #60

We should really make a decision here, and at least fix the shared-test-vectors to include only parts we expect implementations to support.

For easy-to-eyeball reference the current path taken by js-ipfs is: https://github.com/multiformats/js-multibase/blob/c8f762996e47403c0c41c4f16c35c7b252c4f31e/src/constants.js#L14-L39

refs:

/cc @vmx @rvagg @lidel @hugomrdias @creationix

@ribasushi ribasushi unassigned ghost Jun 3, 2020
@creationix
Copy link

Are there actually use cases where only decimal digits but a large or arbitrary number of digits is allowed?

I know of lots of places that store integers, but those have limits on size typically 32 or 64 bits which is way too small for hashes.

@rvagg
Copy link
Member

rvagg commented Jun 4, 2020

Same question as @creationix -- I find these bases only interesting for academic purposes and would love to know what real-world use-cases there might be, are we just doing completeness for completeness' sake?

Regarding the specific question, +1 on adopting what JS is doing now. The approach to base8 is consistent with the other bases so I think the change is correct and the test fixtures should change.

@fabianhjr
Copy link
Contributor

I think I have some time to move #60 along. sorry for stalling Q_Q

@ribasushi
Copy link
Contributor

I think I have some time to move #60 along. sorry for stalling Q_Q

No worries @fabianhjr! May I suggest pivoting a bit and reframing the spec into a generic "rfc4648-derived" spec covering base8, base16, base32 and base64? This way you can both abstractly define padded/non-padded variants and we can still get away by defining just the types we want implementations to support.

As a logical step 2 the base36 spec could be reworked into "base-X spec" to define base10,base36 and base58.

This code-block puts in perspective what I mean by "let's just have 2 generic specs":
https://github.com/multiformats/js-multibase/blob/c8f762996e47403c0c41c4f16c35c7b252c4f31e/src/constants.js#L14-L39

@fabianhjr
Copy link
Contributor

@ribasushi, pushed some changes to leave the simple mapping and mention it as RFC4648 derived.

This was referenced Jun 26, 2020
@sg495
Copy link
Contributor

sg495 commented Oct 15, 2021

Given that base2, base8 and base10 and base16 are all common bases for number literals, it would be good to have common behaviour when decoding non-canonical strings. As far as I understand, things currently stand as follows:

  • base2, base10 explicitly preserve leading zeros and encode/decode the trailing data;
  • base8 drops the last incomplete word;
  • base16 is somewhat ambiguous, because rfc4648 Section 3.5 does not mandate a specific behaviour for decoders.

In my opinion, the expected behaviour for these encodings should be the same: preserve leading zeros, then encode/decode using the given base (and choice of alphabet for digits). This is effectively the same as zero-padding bits to the left, and is the same behaviour as base36 and base58.

However, base16 is described by rfc4648 Section 8 as being analogous to base32 and base64, and the latter both mandate zero-padding of bits to the right when necessary to complete a bit group in encoding. Furthermore, rfc4648 Section 3.5 mentions that encodings with non-zero bit padding MAY be rejected by decoders, from which one might deduce that the intended behaviour for decoders was also to consider zero-padding of bits to the right. However, this is not explicitly mandated, as far as I understand.

The above suggests three possible choices when decoding odd-length strings in base16:

  • reject them, as done by the the base64 module of Python;
  • zero-pad them to the right, as rfc4648 might seem to indicate;
  • zero-pad them to the left, analogously to base2 and base10 (which should also be what base8 does, IMHO).

@daviddias daviddias removed their assignment Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants