Allow 4-character abbreviations for mnemonic when using the existing-mnemonic workflow #242

yorickdowne · 2022-02-05T14:14:48Z

This addresses #167

Rationale: Users may store their mnemonic in steel to protect against loss. Steel tablets record the first four characters of each word, not the full words. To recover keys, it is desirable that deposit-cli allow the use of 4-character abbreviations.

Previously submitted as #168 and now re-submitted to a) have a better git workflow and not use the master branch directly and b) have a clean history of the PR.

All tests run by python3 -m pytest . pass.

The core of this change is in staking_deposit/key_handling/key_derivation/mnemonic.py .

Choices made:

To get abbreviations from the word list, I normalize to 'NFKC' and slice to 4 characters or less. This specific normalization was chosen so that combining Unicode characters work and the string doesn't get sliced to 3 visible characters.

I am using a nested for list comprehension that compares the full word language map against abbreviations, then flatten the resulting 2D list again. If there's a more elegant way of doing this, please share!

Because abbreviations can match several languages, and then merely fail the checksum, I am continuing the search through the languages when a checksum fails.

The new-mnemonic workflow has not been touched and still does a straight compare. Allowing abbreviations during the verification step in that workflow would be a different PR.

CarlBeek · 2022-03-24T12:10:08Z

Thanks for the PR @yorickdowne, very cool! 🦏 This will be especially useful to have implemented as people need to generate their withdrawal keys when withdrawals become possible soon (tm).

Some things:

I made a PR against your branch with some minor tweaks + some testing here: https://github.com/yorickdowne/eth2.0-deposit-cli/pull/1
I am a little worried about how shortening words works in languages with non-Latin alphabets. These changes work well in English, etc but shortening words with word[:4] may not work in other alphabets. That said I can't find an instance of this being the case. (Because shortening Chinese, Korean, and Japanese with word[4] has no effect.) Things would break with Arabic though (maybe this is why there isn't an Arabic BIP39 word list?).
Agreed, NFKC is the reasonable choice here. 👍
"I am using a nested for list comprehension ..." Yeah, unfortunately, this looks to be the best solution here. My automagic language detection is pretty ugly to start and unfortunately it doesn't interact well with this PR. I'm not unhappy tho.
I am worried about my language detection getting confused due to multiple languages passing the checksum now that only 4 letters are used. A 12-word mnemonic only has 4 bits of error correction, so I wonder whether it is possible to have two languages pass the checksum-check when only 4 words are chosen.

yorickdowne · 2022-03-24T13:32:35Z

Merged. Not sure the changes have the workflow for new-mnemonic, and I see you changed the instructions there. I'll take a look and add new-mnemonic abbreviation handling if it's not already in.
Shortening with word[:4] works for all test languages so far, because of the chosen NFKC. If a character uses two or more unicode characters for its representation, NFKC combines that into one character. Good question how this would work for Arabic. BIP-39 does not mandate four-letter abbreviations, but does cite them as part of an "ideal word list". I'd say cross this bridge if there are ever word lists that do not support abbreviations.
I share this worry. None of the examples in the tests conflict, but that doesn't mean it's impossible. I can think of five solutions.
a) Trust that automatic language detection will continue to work with Spanish and Portuguese and won't have conflicting checksums - the only two word lists I am aware of that could even be detected in two languages, right now, when using abbreviations.
b) If Spanish or Portuguese are detected, prompt the user to confirm the language before moving on. Allow them to select the correct language.
b') The same but always prompt the user to confirm, do not have an opinion of which word lists may generate multiple answers.
b'') Do not return immediately when a match is found. Store the result, and continue iterating through languages. If there are two or more valid results, display this to the user and ask them to choose the correct language
c) Ditch automatic language detection

Of all these, I like b'') best. It addresses the issue narrowly while keeping the existing UX intact for the vast majority of cases - possibly even all cases.

CarlBeek · 2022-03-24T15:40:47Z

Indeed, I should not have included the new-mnemonic instructions
Agreed, I think we're ok on this front.
Yeah, my gut feeling is that I could find a collision if I tried. Especially between the romance languages. I don't think it's common in normal usage, but we have hundreds of thousands of users and it's not cryptographically unlikely.
- I'm not convinced b) is sufficient. There may be collisions between more of the languages. Also adding special cases is an anti-pattern.
- Agreed that b'') seems like the best answer for now.
- I'm personally for c) in the longer term. I think we should ask the user for their mnemonic language and then you predictive text to help the user enter words as they go.

CarlBeek · 2022-03-28T09:28:00Z

I opened another PR (https://github.com/yorickdowne/eth2.0-deposit-cli/pull/2) against this branch to address the last few things I'd like to get done before merging this PR.

Adds script test for existing_mnemonic
Adds abbreviated words functionality to new_mnemonic (it was a 1-liner, lol)
Adds script test for new_mnemonic
Adds check that a mnemonic isn't valid in multiple languages in case abbreviated words were used.

CarlBeek

Thanks for all the work & changes @yorickdowne!

For those wondering, the last force-push was a rebase onto dev

Allow 4-character abbreviations for mnemonic when using the existing-mnemonic workflow

CarlBeek changed the base branch from master to dev March 28, 2022 09:28

yorickdowne and others added 7 commits March 28, 2022 12:50

Allow 4-character abbreviations of mnemonic words

fb5ffe6

abbreviation tweaks

5953f30

Adds abbreviation tests

567ae36

Fix lint; revert new-mnemonic text change

bbeeb1f

Adds tests for existing mnemonic abbreviated words

202a620

Adds abbriavted words to new-mnemonics

885826c

Adds check that multiple mnemonic languages aren't detected

0d3440e

CarlBeek approved these changes Mar 28, 2022

View reviewed changes

CarlBeek merged commit c97cfc3 into ethereum:dev Mar 28, 2022

This was referenced Mar 28, 2022

Release version v2.1.0 #252

Merged

Actually shortens words for abbrviated mnemonic test vectors #254

Merged

everhusk pushed a commit to earthwallet/earth-wallet-cli that referenced this pull request Aug 3, 2023

Merge pull request ethereum#242 from yorickdowne/abbreviations

289f940

Allow 4-character abbreviations for mnemonic when using the existing-mnemonic workflow

karalabe mentioned this pull request Oct 1, 2023

Validator manager should accept 4 letter prefixes for mnemonic words sigp/lighthouse#4797

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow 4-character abbreviations for mnemonic when using the existing-mnemonic workflow #242

Allow 4-character abbreviations for mnemonic when using the existing-mnemonic workflow #242

yorickdowne commented Feb 5, 2022

CarlBeek commented Mar 24, 2022

yorickdowne commented Mar 24, 2022

CarlBeek commented Mar 24, 2022

CarlBeek commented Mar 28, 2022

CarlBeek left a comment

Allow 4-character abbreviations for mnemonic when using the existing-mnemonic workflow #242

Allow 4-character abbreviations for mnemonic when using the existing-mnemonic workflow #242

Conversation

yorickdowne commented Feb 5, 2022

CarlBeek commented Mar 24, 2022

yorickdowne commented Mar 24, 2022

CarlBeek commented Mar 24, 2022

CarlBeek commented Mar 28, 2022

CarlBeek left a comment

Choose a reason for hiding this comment