CodexQR and Codex32 QR for computer readable Codex32 strings #66

BenWestgate · 2023-09-16T14:50:51Z

BenWestgate
Sep 16, 2023

Stumbled across this:
https://help.blockstream.com/hc/en-us/articles/10426338118169-What-is-a-SeedQR-

It's easy to type a 128-bit codex32 share, but may not be without a keyboard.

There's no reason codex32 shares cannot be encoded in compact QRs, but we also need to at minimum store the threshold, identifier and index.

BIP39 SeedQR has two levels:

Standard: Uses the index of each recovery phrase word (based on the BIP39 wordlist) and concatenates them into one long stream of digits. This specification can be scanned by any QR code reader and fairly easily converted back to your list of words.
Compact: Expresses the index of each word in binary instead, making the QR code 35-40% smaller. This specification is much harder to read and interpret without the proper software. Blockstream Jade allows export of this type of SeedQR due to these benefits.

For codex32 shares:

Standard: might as well be encoding the entire uppercase codex32 string in a low error correction QR, just like addresses.

Compact: is more interesting, here we have 3-bits for the threshold, 20-bits for the identifier and 5-bits for the index, plus 130-bits, 260-bits or 515-bits for the payload.

128-bit shares:

The most compact representation is 20 bytes if the codex32 checksum is dropped, which is safe to do since QR has its own.

20-bytes means it can be encoded by taking threshold+identifier+index+payload as a list of base32 values with length 32 and then converting to bytes.

256-bit shares:

3-bits threshold, 20-bits identifier, 5-bits index, 260-bits payload = 288 bits = 36 bytes

Here going from a list of base32 values per character to bytes ends up needing 37 bytes.

512-bit shares:

Here also, using 3-bits for the threshold allows encoding this in 68 bytes. While using a base32 character list to bytes requires 69 bytes.

Do you prefer serializing the threshold as 3 bits or as 5 bits to keep the character mapping intact, knowing it adds 10 plus pixels to the 256-bit and 512-bit CodexQR?

Error Correction level:

The codex32 checksum allows restoring ~27% of the data for 128-bit seeds, 17% for 256-bit seeds and 11% for 512-bit seeds.

Error Correction	Redundnacy
Level M (Medium)	15% of data bytes can be restored.
Level Q (Quartile)[79]	25% of data bytes can be restored.
Level H (High)	30% of data bytes can be restored.

Using Level H for 128-bit, Level Q for 256-bit and Level M for 512-bit gives better redundancy than codex32.

I am unsure what SeedQR is using.

Should we copy SeedQR, use the levels above that are at least as good at correcting errors as the original codex32 or use the least error correction that easily scans?

Conclusion

While this probably shouldn't be a replacement for writing the codex32 string, it is much faster and less error prone to enter. And time spent drawing easily pays for itself after just a couple scans.

It looks like it's possible to decode QR codes without errors back to binary, as well as draw one without any error correction from binary. I believe the reed solomon coding is GF(2^8) so not as paper computer friendly (paper computer impossible?)

Either way, whenever a share is imported, it'd be a nice option to get the QR displayed. Once a codex32 secret is recovered, all CodexQRs could be generated for supplementing the backups.

Should I add this functionality to Bails?

I looked into OCR are there were too many dependencies, but QR scanning is easy. It seems like a nice option to side step the unintelligible handwriting problem of importing other people's shares.

apoelstra · 2023-09-16T15:28:40Z

apoelstra
Sep 16, 2023
Maintainer

Do you prefer serializing the threshold as 3 bits or as 5 bits to keep the character mapping intact, knowing it adds 10 plus pixels to the 256-bit and 512-bit CodexQR?

I think 3 bits is fine. If we're dropping the codex32 checksum anyway it already requires post-processing to get the original string back so I think there's no harm in messing up the characters.

Should we copy SeedQR, use the levels above that are at least as good at correcting errors as the original codex32 or use the least error correction that easily scans?

I like the idea of having a "standard" and "compact" version. I think "standard" should max everything out and "compact" should minimize them. And clarify that the 'compact' version is only for ephemeral uses; it shouldn't be printed etc.

But TBH I rarely use QR codes and don't have a good intuition for them.

2 replies

BenWestgate Sep 16, 2023
Author

I counted the pixels in 12 word SeedQR and it’s a version 1, the smallest size.

We cannot fit a 128-bit share in v1 ECC level L compact CodexQR without leaving out some of the identifier.

It looks like 33x33 1089 pixels!?! Encodes the share as a string with the original checksum and 30% error correction.

My numbers above were wrong, QR is correcting errors at those rates, not erasures. So much lower levels are needed to match the removed checksum in the compact format.

BenWestgate Sep 16, 2023
Author

I only have 17 bytes to work with for 21x21 ECC level L, which leaves 6-bits for the header data, 5 must be the index, leaves 1-bit for the threshold and identifier.

I feel like that's just barely enough for the decoder to know when to import the seed if:

A "special" padding on codex32 secret was used, and especially if the secret can derive the shares or at least the indices of shares that also have "special" padding. The bip32 fingerprint if known could be another check to avoid prompts or invalid combinations.
The extra bit forms a binary error correcting code of the 3-bit threshold, after 2 shares, there will be only 2 possible thresholds and if both are >2 it knows to keep scanning.
When the two threshold options include threshold 2, AND weak checks from above pass for threshold=2, it might ask the user to confirm the 4th digit of their codex32 share. Most (3/4) of the time it could transparently skip this prompt.

OR

These checks aren't all possible for shares generated by hand. Relying on the threshold bit error correction code alone would always prompt when two threshold 2 shares are scanned. With custom encoding, it could never prompt for two CodexQRs scanned with threshold 3, 4, 5 or 6. And would prompt 1/3 of times that two threshold 7, 8 and 9 are scanned.
The prompts could be eliminated entirely for threshold 2 if support for threshold 7, 8 and 9 are dropped from Compact CodexQR. Is this the best trade-off?
Additionally, a bit or two could be discarded from the share index, making only 8 or 16 share indexes compatible with Compact CodexQR, for example discarding the MSB or fixing one of the less significant bits to "1" eliminating "s" or "16" as a possibility:

or to improve compatibility with the codex32 book, sort the first 8 to 16 shares A, C, D... for 3 or 4 bits. This allows not dropping threshold 7, 8, 9 and learning 1-bit of identifier in two shares or learning 2 to 4-bits of identifier when combined with dropping those thresholds. 4-bits of identifier in 2 shares offers some practical prevention against wrong combinations.
The threshold could be restricted to just (2,3) (as per recommendations) this allows learning 1-bit of identifier for 32 share indexes, 3-bit of identifier for 16 share indexes and 5-bit of identifier for 8 share indexes.

What do you think is the best use of the 17-byte space?

The next size up is 25x25 ECC level Q which has 20-bytes to store the full header and payload in one QR and repair 25% damage.

But is it worth 64% more drawing to include the full header for "Compact" CodexQR? I'm imagining these always stored along with the handwritten codex32 string. If they're too much a hassle to draw, people won't use them at all.

I watched the SeedQR author draw a 25x25 in 8 minutes here:
https://www.youtube.com/watch?v=c1-PqTNx1vc

So the 21x21 can be drawn atleast 3 minutes faster per share. Potentially saving half an hour for 9 Compact CodexQRs.

apoelstra · 2023-09-17T13:47:35Z

apoelstra
Sep 17, 2023
Maintainer

I only have 17 bytes to work with for 21x21 ECC level L, which leaves 6-bits for the header data, 5 must be the index, leaves 1-bit for the threshold and identifier.

...

I'm imagining these always stored along with the handwritten codex32 string. If they're too much a hassle to draw, people won't use them at all.

Yeah. I didn't read your whole post, and am trusting you when you say that the QR checksum is much stronger than codex32 even at low settings (which isn't that surprising to me; QR codes seem to work really well in pretty hostile environments .. at least, scanning them works well). But I agree that the header, or at least the identifier and maybe threshold, is fine to just write in plaintext somewhere on the QR device. And then the user would need to manually type it when scanning the QR code. Which sucks ... but sucks much less than making them create a far larger QR.

4 replies

BenWestgate Sep 17, 2023
Author

...am trusting you when you say that the QR checksum is much stronger than codex32 even at low settings.

Level "L" corrects 7% byte errors and codex32 checksum corrects 4/48 = 8% (someday...), but manual entry introduces much more errors than scanning.

And then the user would need to manually type it when scanning the QR code. Which sucks ... but sucks much less than making them create a far larger QR.

It's avoidable subject to a few conditions:

If codex32 identifier is bip32 fingerprint, it can be regenerated from a threshold of QRs.

If the thresholds are restricted to 2, 3, 4, 5 and 6, then implementations can avoid asking for threshold, two QRs reveal enough to determine if t == 2 and a third QR enough to fully determine the threshold.

If the threshold is further restricted to either 2 or 3, the free bit per QR can store it plus 1-bit of ID across 2 compact CodexQRs.

If compatible share indexes were cut to first 16 letters, all of the above plus t bits of ID.
If compatible share indexes were cut to first 8 letters, all of the above plus 2 * t bits of ID.

If the padding were set as ECC of the threshold, identifier and byte payload, (like my electronic implementation), it effectively gives 2 more bits of info per QR (after a threshold) from reading this ecc. Gains 4-bits of mistake detection (like 12-word bip39) against wrong QR combinations.

We can do a lot better if we add a SLIP39 style digest to a share:

We propose that given a secret, T − 2 shares be generated randomly and the remaining shares be computed in such a way that f(255) encodes the shared secret and f(254) encodes the digest of the shared secret. Encoding the digest makes it possible to verify that the shared secret has been correctly recovered.

...share index 254 is used to encode the digest of the shared secret S. The share value D corresponding to index 254 consists of two parts. The first 4 bytes of D encode the actual digest and the remaining n − 4 bytes R are randomly generated. The digest is computed as the first four bytes of HMAC-SHA256(key=R, msg=S). Encoding the digest makes it possible to detect an invalid set of shares with a random failure chance of 2^−32.

With a digest and BIP32 fingerprint identifier, all 9 thresholds, all 32 indexes and the identifier, can be recovered without prompting by scanning a threshold of QRs. Wallets may not know the threshold to display the user's progress towards it until a threshold or 3rd share is scanned however (unless valid thresholds are restricted to 2 or 3, then it can be learned & displayed after 1 compact CodexQR).

It may also solve problems we encountered in #57 regarding "resharing" a master seed. Digest gives strong protection against invalid sets, even if the identifier is accidentally (or intentionally) reused.

What do you think about amending the "For an existing master seed" to include a 4 byte digest in share A?

BenWestgate Sep 18, 2023
Author

"R" in slip39 is defined as the random part of the digest share. If before hmac, R or S is prepended with threshold and identifier, it should be possible (not guaranteed but very likely) to grind and uniquely recover custom identifiers too. 1/2^12 chance of failure to detect an invalid set of QRs with a custom ID, though what it finds for a valid identifier should be displayed for confirmation.

apoelstra Sep 18, 2023
Maintainer

What do you think about amending the "For an existing master seed" to include a 4 byte digest in share A?

How about if we use share 9 for this, which is the last letter alphabetically and least likely that the user will generate and store? (Though I guess if we are using random shore indices they all have equal probabilitiy..)

But I think it's a good recommendation. We should use MAY for it since I'm hesitant to say that it's always the right thing to do, or worth the theoretic privacy loss. It would be nice to have some way to determine whether the digest is supposed to be there or not. Even if computer-generated shares MUST or SHOULD have the digest, hand-generated ones won't.

It may also solve problems we encountered in #57 regarding "resharing" a master seed. Digest gives strong protection against invalid sets, even if the identifier is accidentally (or intentionally) reused.

Ah, nice, that's true -- again, assuming that we have a way to determine whether the digest is supposed to be there.

BenWestgate Sep 18, 2023
Author

What do you think about amending the "For an existing master seed" to include a 4 byte digest in share A?

How about if we use share 9 for this,
You said it complicates paper sharing to not use indices in alphabetical order, that's why I said 'A', but you can't compute a secure digest without putting an existing master seed on electronics.

One may argue at that point: why not secret split electronically? But there's a big difference between picking 4 bytes (~6 characters) of 1 share non-randomly and the rest "by-the-book" vs electronically doing the whole split (especially if users won't audit.)

(Though I guess if we are using random shore indices they all have equal probabilitiy..)

It's absolutely no trouble to delete an index before shuffling, I already delete 's' and indices from "existing shares".

It would be nice to have some way to determine whether the digest is supposed to be there or not.

Well if the padding is the ECC type that gives 2-bits * threshold confirmation to expect the digest, padding plus a specific ID (bip32 fingerprint) is probably best sign to expect the digest.

apoelstra · 2023-09-18T19:02:51Z

apoelstra
Sep 18, 2023
Maintainer

You said it complicates paper sharing to not use indices in alphabetical order, that's why I said 'A', but you can't compute a secure digest without putting an existing master seed on electronics.

If we put the digest in the share, should this be a share we expect the user to have? We can't really do that, reliably, so I assumed we wanted to put it in a share we don't expect the user to have (and you'd need threshold-many shares to recompute it to check it).

We still can't guarantee that the user won't have a particular share, but it's an easier thing to make probable.

Well if the padding is the ECC type that gives 2-bits * threshold confirmation to expect the digest, padding plus a specific ID (bip32 fingerprint) is probably best sign to expect the digest.

I think we agreed elsewhere that the padding should be random, under normal circumstances. When generating by hand it will be rkandom, and somewhat difficult/annoying to skew because of the way tho dice worksheet works.

Then, if we want the ID to be something specific, we again have trouble in the re-sharing case where the ID is supposed to change.

0 replies

BenWestgate · 2023-09-19T20:18:57Z

BenWestgate
Sep 19, 2023
Author

Digest Flag

It would be nice to have some way to determine whether the digest is supposed to be there or not. Even if computer-generated shares MUST or SHOULD have the digest, hand-generated ones won't.

Besides "special" padding or IDs, changing the hrp to flag for it or adding even more non-random data to the digest share (a couple bytes prefix/suffix), I'm not seeing great ways to flag for a digest.

For the QRs however, they have 1 free bit that can flag for a digest and still encode the threshold if it's restricted to 2 or 3.

It's not intuitive but the threshold could be written as A, C, D, E, F... for 2, 3, 4, 5, 6... when a digest and bip32 fingerprint ID is present.

Digest share index

If we put the digest in the share, should this be a share we expect the user to have? We can't really do that, reliably, so I assumed we wanted to put it in a share we don't expect the user to have (and you'd need threshold-many shares to recompute it to check it).

A reason to put digest in a share the user may have: If you did this by hand (roll random data, put the characters on a computer to create a digest, write it into the digest share.) It's more labor to interpolate another share to destroy the digest share. A 30, 35 or 40-bit digest would be less labor than 4 bytes as random data and digest won't mix in the same character. 40 is easiest as it's a direct 5-byte conversion.

Does it matter if they have the digest share or not? The digest share's index could even depend on the secret for 5 more bits of security. Schrodinger's digest share.

If it matters, then extend "R" to include all the randomly generated share data, not just the m - 32 random bits of the digest share. Now it always requires T − 1 share values to help brute-force search.

From SLIP39:

Let m denote the entropy of the shared secret in bits. A disadvantage of encoding the digest of the shared secret is that an attacker who has knowledge of T − 1 share values can reduce the entropy of the shared secret to m − 32 bits by performing a brute-force search over the 2^m possible values of the shared secret and eliminating the ones which give an invalid digest. The entropy of the shared secret must be sufficiently large to make such attacks impractical, which is why this specification requires that m ≥ 128.

The digest could be hardened by a KDF.

I think we agreed elsewhere that the padding should be random, under normal circumstances. When generating by hand it will be random, and somewhat difficult/annoying to skew because of the way the dice worksheet works.

Parity bit padding is hand computable. Generate (or have) 128 random bits, then count the ones, and even position ones. It triples the (non-index & non-payload) bandwidth of compact QRs to 3-bits so is worth using if they may ever want to draw QRs of their shares. 3-bits allows encoding all thresholds plus some weak protection against combining wrong QR sets. (even w/o a digest)

Non-random Digest or Padding are the same. So let padding be random.

There may be an option with a 5 byte digest (8 characters) that avoids mixing random and non-random data in the same character complicating the dice worksheet. It feels like it doesn't matter (??) which share bits are non-random, the total security drop will be the same. So they can move around across and within shares without changing the T - 1 brute force situation.

Ex: ordinary T=2 codex32 has 260-bits of security on a 128-bit secret, which is why padding could be non-random for no T-1 privacy loss.

How about generate T * 26 - 8 characters randomly and the final 8 of the T-th share is optionally the digest of the previous shares concatenated and first 27 characters of this final share... No funny business, just type the complete and incomplete strings into sha256sum and convert the first 5 bytes to bech32.

Can the digest be codex32 instead of SHA256?

Can you tell me what's wrong with setting these 6-8 characters using a truncated codex32 checksum (with data set to all random characters generated) as the "digest"? It seems you lose len(digest) security at T - 1 no matter what, so why not make it a hand computable digest? This seems like it protects against malicious tampering if adversary lacks T shares? Or could structured errors be introduced to one share that adversary knows won't change the digest? In that case, sha256 is better, unless the ID is always fingerprint.

Recovering the identifier from compact QRs and detecting Invalid combinations

Then, if we want the ID to be something specific

For compact QRs, to recover an ID it must be either standardized, included in the digest and brute forced for or encrypted by the digest. Perhaps the ID is XOR'd over the digest, using the known 12-bits of the 32 to confirm the set combination is valid and/or shares are untampered and the first 20-bits to decrypt and recover the ID. That way we get full 32-bit protection against bad combos and malicious tampering when using full shares and 12-bit when using compact QRs.

For encryption to work in the 8 character digest example above, the digested data would have to be limited to just index and payload characters so the QRs can compute it.

the re-sharing case where the ID is supposed to change.

We decided the threshold and ID alone must not be the only nonce when re-sharing. The user gives a "unique string" and the wallet gives a monotonic counter + serial number / install date.

Then the bip32 fingerprint ID is encrypted by the unique string for the default ID. IDs may rarely collide, but the share payloads are unique and secure if either wallet or user created a nonce. Combining two shares with the same ID that are different sets can be overwhelmingly detected by the digest.

0 replies

BenWestgate · 2023-09-20T20:40:34Z

BenWestgate
Sep 20, 2023
Author

On "standard" Codex32 QR if just 1 character is deleted the remaining 47 can be encoded in a 25x25 QR rather than needing size 29x29.
25x25 can be hand drawn in 8 minutes, 29x29, probably adds 60% to drawing time. Doesn't seem worth it. See below:

Removing one character from the 48 character string:

Using all 48 characters:

Alternatively, drop 10 characters of codex32 checksum allows using ECC level M on a 25x25:

Level M, corrects twice the errors (15%). In theory, codex32 checksum can assist the QR ECC. In practice, a too damaged QR just won't scan and they'll type their written codex32 share. Until there's a hybrid decoder QR ECC uses space better than codex32's.

Scanning the third with ECC level M is only slightly better when I obscure significant amounts, (1 more row, max) the main advantage for non-compact QR is they scan as strings rather than bytes. Drawing longer to get more ECC is not worth it.

And this is your idea, max everything out 48 characters, Level Q (30% correction) ECC fits in 33x33:

Near a thousand elements so printable only, but unlike the 3 above it scans with major damage:

The highest ECC level Q is hit or miss, sometimes it's far more robust, other times it's like "L" and takes noticeably longer to scan. It's more frustrating to use until the day it repairs damage.

All of these decode equally well since any deleted checksum is re-created. Nothing needs to be standard as long as only checksum characters are dropped.

6 replies

BenWestgate Sep 21, 2023
Author

For the QRs however, they have 1 free bit that can flag for a digest and still encode the threshold if it's restricted to 2 or 3.
This sounds great to me. I think it's fine to say "for thresholds higher than 3 there is no compact encoding".

I can decode threshold 2 or 3 by scanning two compact QRs and leave the bit available to flag for a future digest.

The UX loss is the implementation won't know the threshold if the first share is a compact QR.

After the first is scanned, it can only say: enter/scan "your next share". Instead of "share 2 of 2" or "share 2 of 3"

Probably worth gaining the flag bit. Maybe it's a privacy feature by blinding threshold until adversary finds 2 compact QR codes!

However, isn't a digest something we want the codex32 string itself to be able to flag for? Especially, if it's made with shortened codex32 "quickchecks" and could be added by hand instead of filling the final 6-8 random characters by dice?

I think do 6 char digest if codex32 as there's no way to harden it. That leaves 100-bit security with T-1 shares compromised, correct?

Codex32 digest or hardened KDF digest are mutually exclusive. So let's think about the value proposition of a hand computable digest. I know it has error correction guarantees unlike SHA-2.

Would it be faster to produce a 6-8 character codex32 "quick checksum" over the header and random payloads (26+20 char for T=2, 52+20 for T=3) than roll these characters?

Going the hardening route always ends up being millions of times weaker to support HWWs so I lean towards a codex32 hand computable digest even though it's non-standard.

The book itself says use SLIP-39 if you need KDF hardening.

If you agree: requiring 2 distinct QR scans to learn threshold is acceptable UX and would NOT flag for a future digest within the codex32 payloads, I'll leave that bit free to do so in compact QR.

And we can revisit digest flagging within codex32 strings and whether to use a hardened cryptographic or hand computable digest when your quickcheck work is done.

apoelstra Sep 21, 2023
Maintainer

Probably worth gaining the flag bit. Maybe it's a privacy feature by blinding threshold until adversary finds 2 compact QR codes!

lol :). No, I think we should use the flag to indicate threshold 2/3. But I don't feel super strongly about this. I think it is "acceptable" not to have this, and to have an open-ended "you need more shares" UX.

Codex32 digest or hardened KDF digest are mutually exclusive. So let's think about the value proposition of a hand computable digest. I know it has error correction guarantees unlike SHA-2.

I think we should use the hardened KDF. I think it's a super bad idea to leak actual seed data into shares (and I think basically anything hand-computed would have this problem). I wouldn't mind using a non-hardened hash if you want to make life easier for HWWs.

BenWestgate Nov 16, 2023
Author

After switching to base45 encoding, the 21x21 compact QR has 2.3 free bits, 1-bit may encode threshold 2 or 3.

The remaining bits can be filled with:

bip32 fingerprint
codex32 checksum
codex32 checksum AND MUST use bip32 fingerprint ID
fingerprint ID flag
compact QRs MUST have digest plus 1, 2 or 3

Which seems like the best option?

bip32 fingerprint

Pros: Immediately rejects (~60%) wrong QRs w/ fingerprint. Bad sets and tampering detected (100% - 40% ^ qr_qty) even without fingerprint.
Cons: Need fingerprint to draw QRs. Can't reject wrong shares.

codex32 checksum

Pros: Direct conversion from string to QR without other knowledge. Wrong shares immediately rejected (100% - 40% ^ qt_qty).
Cons: Must type a share to reject (60%) wrong QRs. Can't detect tampered sets.

codex32 checksum AND MUST use bip32 fingerprint ID

Pros: All pros above, plus bad or tampered set detection (100% - 1 / 2 ^ 20) w/ ID and codex32 string recovery from a QR and fingerprint.
Cons: Must know codex32 string has fingerprint ID to draw QR. Must type share OR know fingerprint to reject (60%) wrong QRs. No* custom IDs (standard QR works) must recover fingerprint and relabel the whole set.

fingerprint ID flag:

Pros: Doesn't require this share feature. If set: Bad set and tampering detection (100% - 1 / 2^20) w/ ID, string recovery w/ fingerprint.
Cons: Set detection requires typing a share. Loses immediate wrong QR/share rejection, must know flag value to draw QRs.

compact QRs MUST have digest

Pros:

Best bad set and tamper detection without an ID or fingerprint.
Immediate detection: can choose one of the schemes above.
Redundancy: a digest helps guide error correction.
Can support any threshold and boost immediate rejection (80% per QR): drop threshold bit, use 2.3-bits of codex32 checksum AND MUST use bip32 fingerprint ID. Threshold is recovered in 2 QRs w/ fingerprint to quit early on bad sets, while digest prevents import before correct threshold with no fingerprint.

Cons:

Must specify a new generation method for digest sets.
Can't create compact QRs for hand generated share sets.
Can't draw QRs if the user doesn't know the shares have a digest.
Digest reduces T-1 security.

*With desktop generation, its possible to have custom IDs be fingerprint IDs by grinding (<1min) seeds until the fingerprint matches the desired ID. This is no practical security loss since the table of ~2^108 seeds resulting in a given ID can't be stored.

apoelstra Nov 16, 2023
Maintainer

I don't have a good intiution for any of these options. But I like "fingerprint ID flag" the best because it's the simplest to implement and think about.

BenWestgate Nov 16, 2023
Author

A few thoughts:

QR only backups will be rare. I won't recommend people make compact QRs without also writing the share because they're lossy and have poorer mismatch detection. This makes round trip a luxury.
QR only recoveries get tried first. If someone draws one, they probably drew all, why type when the scan takes 1 second. Especially for ledger style.
Not detecting wrong QR scans can ruin all time savings. Worst case leads to re-travel. We try to mitigate this by labeling the QR with ID, this gives round trips from QR + label to string too.
The most frequent recoveries have fingerprint(s) QRs are popular for stateless signers which will have access to the fingerprint(s) during recover and sign.
No fingerprint recovery is rare. Occurs only after the loss of all digital backups.
No fingerprint or known address is rarest. Might happen to heirs.
Heirs care about detecting a tampered set Someone may be malicious.
Easy conversion from codex32 share to QR matters. The less knowledge needed the better. This way people can update a share set to QRs without gathering a threshold in one place.
Detecting wrong shares is less important. If you have 2 wrong, the first ID visibly rejects the later, while wrong QRs are invisible to humans.
In the future, most shares will have the default ID. None of my dozen+ testers wanted a feature to change share ID.
Software can ask the user to type the ID when the fingerprint is missing so that subsequent wrong QRs can be immediately detected.

BenWestgate · 2023-11-16T15:44:40Z

BenWestgate
Nov 16, 2023
Author

I don't have a good intiution for any of these options. But I like "fingerprint ID flag" the best because it's the simplest to implement and think about.

My intuition is we want to replicate the codex32 share experience as closely as possible, given the lack of space, that means knowing the threshold and some info about set membership from the first scan. This seems in line with principle of Least Astonishment.

Also remember: any codex32 features sacrificed to fit in 137.3 bits stay in the 25x25 "standard" QR as these directly encode the string. We previously justified culling thresholds to 0, 2 or 3 saying "higher can draw standard QR." Similar could be said for custom* IDs. Agreed?

A flag for digest or fingerprint ID is simple, but users may not expect these have no immediate wrong set rejection ability like codex32 shares they came from do.

It's unhelpful to learn you screwed up only After you've scanned QR 3 of 3, (possibly flown hours); a descriptor, PSBT, address or empty wallet can tell you this. So I think the early warning of the first 3 options are more "share-like" than flags and more likely to avoid pain and astonishment.

Although maybe 60% detection rate of wrong QRs is so low it'd be better without. It can reach 80% by requiring a digest and dropping the threshold but less share-like with software needing 2 scans and the fingerprint to know the threshold and 1 QR can't convert back to a share (needs 2 + ID or 1 + threshold + ID). Digest generation is more work to implement and incompatible with hand generation. Skip digest complexity and make due with 60%?

I think if users know they're supposed to rely on themselves to visually verify the 4 character QR label matches the previous scan, then extra error detection when they goof up can only help. Right?

The publicly available out of band fingerprint is the magic letting these detect wrong QRs without typing the ID.

*Custom IDs that are also fingerprint IDs can be found with seed grinding on desktop wallets; few seconds finds the desired leading 20-bits in fingerprint.

1 reply

apoelstra Nov 16, 2023
Maintainer

We previously justified culling thresholds to 0, 2 or 3 saying "higher can draw standard QR." Similar could be said for custom* IDs. Agreed?

Ah, yes, agreed.

Digest generation is more work to implement and incompatible with hand generation. Skip digest complexity and make due with 60%?

Yeah, I think 60% is alright. It's a sanity check, not a cryptographic assurance.

BenWestgate · 2024-09-04T23:45:00Z

BenWestgate
Sep 4, 2024
Author

I have thought about QR storage capacity in a new way and realize we have 10.25 free bits across all possible v1 (21x21) CodexQRs if the index and padding are derived from the 16-byte entropy, which seems secure if indexes are kept private.

That supports decoding in one scan: thresholds up to 8 with 1 ID character and 2 ID characters for threshold = 3.
More ID given to t=3 because it's our highest recommended threshold with twice the opportunity to mismatch share sets as t=2.

Much more usable than initially thought!

t=2 has a 3.13% probability that 2 random IDs will match in the first character. By requiring ID to be fingerprint, there is a 1/32 chance of this wrong seed having a fingerprint with the same first 5 bits. 1 in 1,022 false positive rate.

t=3 has a 0.58% probability that 3 random IDs all match in the first two characters. By requiring the ID to be fingerprint, there is a 1/1024 chance of this wrong seed having a fingerprint with the same first 10 bits. 1 in 17,6551 false positive rate.

CodexQRs must be labeled with their fingerprint and user will be asked to confirm the recovered fingerprint matches.

Labeling with BIP32 fingerprint appears to have become a standard template for CompactSeedQR, so this should follow suit:

I think these compacted QRs should be called
"CodexQR", non-compact are mere alphanumeric mode encoding of a Codex32 string and called a "Codex32 QR".

Dropping "32" reflects that the bech32 character set or number system are abandoned to fit more in the QR.

Unimportant details of how I found these extra bits:
The code space is: alphanumeric mode lengths 24-25, numeric 39-41, bytes 17. After encoding 16-bytes, the unused possible codes for all 6 of these length + mode combos is 1221.

0 replies

CodexQR and Codex32 QR for computer readable Codex32 strings #66

BenWestgate Sep 16, 2023

Replies: 7 comments · 13 replies

apoelstra Sep 16, 2023 Maintainer

BenWestgate Sep 16, 2023 Author

BenWestgate Sep 16, 2023 Author

apoelstra Sep 17, 2023 Maintainer

BenWestgate Sep 17, 2023 Author

It's avoidable subject to a few conditions:

We can do a lot better if we add a SLIP39 style digest to a share:

What do you think about amending the "For an existing master seed" to include a 4 byte digest in share A?

BenWestgate Sep 18, 2023 Author

apoelstra Sep 18, 2023 Maintainer

BenWestgate Sep 18, 2023 Author

apoelstra Sep 18, 2023 Maintainer

BenWestgate Sep 19, 2023 Author

Digest Flag

Digest share index

The digest could be hardened by a KDF.

Non-random Digest or Padding are the same. So let padding be random.

Can the digest be codex32 instead of SHA256?

Recovering the identifier from compact QRs and detecting Invalid combinations

BenWestgate Sep 20, 2023 Author

BenWestgate Sep 21, 2023 Author

apoelstra Sep 21, 2023 Maintainer

BenWestgate Nov 16, 2023 Author

bip32 fingerprint

codex32 checksum

codex32 checksum AND MUST use bip32 fingerprint ID

fingerprint ID flag:

compact QRs MUST have digest

apoelstra Nov 16, 2023 Maintainer

BenWestgate Nov 16, 2023 Author

BenWestgate Nov 16, 2023 Author

apoelstra Nov 16, 2023 Maintainer

BenWestgate Sep 4, 2024 Author

BenWestgate
Sep 16, 2023

Replies: 7 comments 13 replies

apoelstra
Sep 16, 2023
Maintainer

BenWestgate Sep 16, 2023
Author

BenWestgate Sep 16, 2023
Author

apoelstra
Sep 17, 2023
Maintainer

BenWestgate Sep 17, 2023
Author

BenWestgate Sep 18, 2023
Author

apoelstra Sep 18, 2023
Maintainer

BenWestgate Sep 18, 2023
Author

apoelstra
Sep 18, 2023
Maintainer

BenWestgate
Sep 19, 2023
Author

BenWestgate
Sep 20, 2023
Author

BenWestgate Sep 21, 2023
Author

apoelstra Sep 21, 2023
Maintainer

BenWestgate Nov 16, 2023
Author

apoelstra Nov 16, 2023
Maintainer

BenWestgate Nov 16, 2023
Author

BenWestgate
Nov 16, 2023
Author

apoelstra Nov 16, 2023
Maintainer

BenWestgate
Sep 4, 2024
Author