Skip to content
This repository has been archived by the owner on Jul 15, 2018. It is now read-only.

Word codec should implement BIP39 #89

Closed
ebuchman opened this issue May 8, 2018 · 24 comments
Closed

Word codec should implement BIP39 #89

ebuchman opened this issue May 8, 2018 · 24 comments
Assignees

Comments

@ebuchman
Copy link
Contributor

ebuchman commented May 8, 2018

keys/words/wordcodec.go should implement BIP39.

I believe it currently uses an adhoc ECC checksum. Instead it should follow https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki

In particular, it must be compatible with the fundraiser keys, ie. https://github.com/cosmos/fundraiser-lib/blob/master/src/wallet.js

(edit by @liamsi) resolves #116

@ebuchman
Copy link
Contributor Author

ebuchman commented May 8, 2018

Also, note, we currently encode the key type into the seed phrase.

Instead, we should have the output be something like ed25519{monkey tree house ... } so that the type is explicit and not encoded into the seed (the seed must be compatible with the bip39 standard and the fundraiser ...).

Related: cosmos/cosmos-sdk#872

@liamsi
Copy link
Contributor

liamsi commented May 11, 2018

keys/words/wordcodec.go should implement BIP39.

@ebuchman Just to be sure (before I start working on this): by implement you just mean the Codec interface should implement ... , right? You are not suggesting to add another BIP39 golang implementation?

@ebuchman
Copy link
Contributor Author

We can use an existing one - we'll have to review it tho. And it should probably fall under whatever policy we decide re #60

@liamsi
Copy link
Contributor

liamsi commented May 15, 2018

Hmm, we currently use https://github.com/tyler-smith/go-bip39 in the keys/hd subpackage. But there we basically check against test-vectors:

b, err := ioutil.ReadFile("test.json")

at the end of this test we verify that it yields the same results for the testfile:
for i, d := range hdToAddrTable {
privB, _ := hex.DecodeString(d.Priv)
pubB, _ := hex.DecodeString(d.Pub)
addrB, _ := hex.DecodeString(d.Addr)
seedB, _ := hex.DecodeString(d.Seed)
masterB, _ := hex.DecodeString(d.Master)
seed := bip39.NewSeed(d.Mnemonic, "")

Also the project seems dead: tyler-smith/go-bip39#7
Good thing about this package, it is very simple, short and probably easy to review.

There is also https://github.com/btcsuite/btcutil/tree/master/hdkeychain (related to #90)

@liamsi
Copy link
Contributor

liamsi commented May 17, 2018

OK, after some long code reading and debugging sessions (comparing behaviour and outputs from golang and JS): above mentioned bip39 library (and some forks thereof) seems to do something funky sometimes when the generated entropy has leading zeroes... :-/

It seems related to how the library does the checksum calculation using a big.Int internally:
https://github.com/bartekn/go-bip39/blob/a05967ea095d81c8fe4833776774cfaff8e5036c/bip39.go#L196-L207
(that's a fork which fixes some issues related to this and which is used in stellar)

Using big.Int basically deletes all leading zeroes so the checksum differs from the spec (and other implementations, e.g., the js we used for the fundraiser, which computes the checksum correctly).

I have a version which works and is compatible to the fundraiser code (bip39 fork and go-crypto branch. But I changed my mind, and I'll add the functionality to our code instead of using a third party lib ...

update: I confused this above, bartnekn's fork works fine tyler's lib doesn't.

@ebuchman
Copy link
Contributor Author

Is this the same issue as https://github.com/cosmos/fundraiser-lib/blob/master/src/wallet.js#L94 ?

I also thought we had tests for this in keys/hd where test.json is generated by the JS code and we're testing we get the same result with tyler-smith - how did you find they dont match ?

@liamsi
Copy link
Contributor

liamsi commented May 21, 2018

Is this the same issue as https://github.com/cosmos/fundraiser-lib/blob/master/src/wallet.js#L94 ?

It looks a bit similar but it isn't exactly the same.

I also thought we had tests for this in keys/hd where test.json is generated by the JS code and we're testing we get the same result with tyler-smith - how did you find they dont match ?

Yes, we do (I linked them above). These tests fo not cover the WordsToBytes equiv. part of bip39. The only test there is one that takes a mnemonic and generates a seed from it (as described here in the spec), (and from that seed creates a master and a priv key from a path). This matches with the JS code :-) The part that doesn't match is the one which takes a mnemonic and recreates the entropy from it (WordsToBytes). This isn't really part of the bip32 spec but we want to this to be able to call the keys.Recover method from the add command in cosmos-sdk (as far as I understand). Also, the JS lib does this right.
Back to your question:

how did you find they dont match ?

To find how they differ, I just ran the existing test (first with tyler's then with bartek's fork), picked the failing as test vectors, if any, plugged them into this JS snippet (to see if this works there):

t.test('compare with bip39.AddChecksum', function (t) {
    var entropy = Buffer.from("000CF70C02EA90959B78FB43F683A690",'hex')
    t.ok(true, entropy.toString('hex'))
    var mem = bip39.entropyToMnemonic(entropy.toString('hex'))
    t.ok(true, mem)
    var toEnt = bip39.mnemonicToEntropy(mem)
    t.ok(true, "back to ent "+ toEnt.toString('hex'))
    
    t.end()
})

output:

# compare with bip39.AddChecksum
ok 1 000cf70c02ea90959b78fb43f683a690
ok 2 abandon guilt seek alarm poverty enlist hospital buyer dumb reduce trust candy
ok 3 back to ent 000cf70c02ea90959b78fb43f683a690

Find 2 golang examples here (output in the comments):
https://gist.github.com/Liamsi/b14eec5d6e273c27c95c74a4200ddd37
https://gist.github.com/Liamsi/a4848b6b79e61719e383d1f6bd567b30

@liamsi
Copy link
Contributor

liamsi commented May 21, 2018

Maybe it's best to have a tendermint/bip32 fork with the changes applied in my/bartekn's fork. The we would want full compatibility to the fundraiser lib tested, though. At least for every aspect we use.

@ebuchman
Copy link
Contributor Author

ebuchman commented May 22, 2018

Wow! Thanks for the details. Just to be clear on steps:

  1. generate entropy
  2. convert entropy to mnemonic
  3. hash mnemonic to get seed

You're saying all of this worked fine, but the function to go backwards from (2) to (1) was broken ? Is that something we ever actually need to do ?

In --recover, it should be fine to just go from mnemonic to seed - why would we need to get back to the entropy ?

@liamsi
Copy link
Contributor

liamsi commented May 22, 2018

Thanks, that is very helpful. Yes, what you write is correct: (2) to (1) does not work (in tyler-smith's lib).

In --recover, it should be fine to just go from mnemonic to seed - why would we need to get back to the entropy ?

Currently,WordToBytes doesn't do what you describe though (independent from the checksum mentioned in this issue description above). Going from mnemonic to seed as described in the spec doesn't even deal with a checksum (see below). The code in WordToBytes is almost the same as the the one in tyler's lib (independent from the changes I made) to get back the entropy from the mnemonic:

func (c *WordCodec) WordsToBytes(words []string) ([]byte, error) {
l := len(words)
if l == 0 {
return nil, errors.New("Didn't provide any words")
}
n2048 := big.NewInt(2048)
nData := big.NewInt(0)
// since we output words based on the remainder, the first word has the lowest
// value... we must load them in reverse order
for i := 1; i <= l; i++ {
rem, err := c.GetIndex(words[l-i])
if err != nil {
return nil, err
}
nRem := big.NewInt(int64(rem))
nData.Mul(nData, n2048)
nData.Add(nData, nRem)
}
// we copy into a slice of the expected size, so it is not shorter if there
// are lots of leading 0s
dataBytes := nData.Bytes()
// copy into the container we have with the expected size
outLen, flex := bytelenFromWords(len(words))
toCheck := make([]byte, outLen)
if len(dataBytes) > outLen {
return nil, errors.New("Invalid data, could not have been generated by this codec")
}
copy(toCheck[outLen-len(dataBytes):], dataBytes)
// validate the checksum...
output, err := c.check.CheckECC(toCheck)
if flex && err != nil {
// if flex, try again one shorter....
toCheck = toCheck[1:]
output, err = c.check.CheckECC(toCheck)
}
return output, err
}

compare to tyler's lib:
https://github.com/tyler-smith/go-bip39/blob/8e7a99b3e716f36d3b080a9a70f9eb45abe4edcc/bip39.go#L92-L151
(logically, they only differ in the checksum and tyler's lib has an additional validation step)

  1. hash mnemonic to get seed
  1. or "getting the seed" on the other hand is basically a one-liner:
    pbkdf2.Key([]byte(mnemonic), []byte("mnemonic"+password), 2048, 64, sha512.New)

why would we need to get back to the entropy ?

My understanding is that Recover is for when you lost your key or your password. So you can use the mnemonic to recover you key from that entropy (and restore it/save it with a new PW). At least this is what Recover looks like (which is called when you run gaiacli keys add --recover my-fancy-key-name):

// Recover converts a seedphrase to a private key and persists it,
// encrypted with the given passphrase. Functions like Create, but
// seedphrase is input not output.
func (kb dbKeybase) Recover(name, passphrase, seedphrase string) (Info, error) {
words := strings.Split(strings.TrimSpace(seedphrase), " ")
secret, err := kb.codec.WordsToBytes(words)
if err != nil {
return Info{}, err
}
// secret is comprised of the actual secret with the type
// appended.
// ie [secret] = [type] + [secret]
typ, secret := secret[0], secret[1:]
algo := byteToCryptoAlgo(typ)
priv, err := generate(algo, secret)
if err != nil {
return Info{}, err
}
// encrypt and persist key.
public := kb.writeKey(priv, name, passphrase)
return public, err
}

@liamsi
Copy link
Contributor

liamsi commented May 29, 2018

Instead, we should have the output be something like ed25519{monkey tree house ... } so that the type is explicit and not encoded into the seed (the seed must be compatible with the bip39 standard and the fundraiser ...).

bip32 / the fundraiser only operates on secp256k1. Do we really want to have different key-types here? Is the code used somewhere else (e.g. for signing) and not only for the wallet? Does it make sense to have the ed25519 option?

@ebuchman
Copy link
Contributor Author

No. I think you're right - since we're only focused on secp256k1 for users for now, we don't need ed25519 here, since this isn't where we manage validator keys.

We can let the KMS become the ed25519 key-management service.

That said, as work on HD-ed25519 and rostretto picks up, we will want to enable them for users through this keybase, so it will be valuable to have a way to handle different key types.

@ebuchman
Copy link
Contributor Author

Tendermint uses two ed25519 keys that it currently just saves in unencrypted files on disk: priv_Validator.json and node_key.json.

We could consider supporting these in the keybase, but for these keys we only want to enter the passphrase once, not every time we sign - or at least unlock it for some amount of time.

We're building the KMS (https://github.com/tendermint/kms) to manage the validator keys in a way that targets many different HSMs.

@liamsi
Copy link
Contributor

liamsi commented May 29, 2018

That's very helpful. Thanks!

That said, as work on HD-ed25519 and rostretto picks up, we will want to enable them for users through this keybase, so it will be valuable to have a way to handle different key types.

I'm wondering if wouldn't it make sense to add support for ed25519 keys (and others) later, as we really need them and keep the API here as simple as possible for now. As soon as we exactly know how we'll use other key-types, we can either generalize the public facing API here in go-crypto, or, even have a separate API/package for these other purpose(s). I prefer the latter approach.

I will check out where and how gaia creates the ed25519 keys before making any decision though.

@ebuchman
Copy link
Contributor Author

Sounds good.

Gaia basically does the same thing as Tendermint.

PrivVal key generated here: https://github.com/tendermint/tendermint/blob/master/cmd/tendermint/commands/init.go#L34

NodeKey priv key is here: https://github.com/tendermint/tendermint/blob/master/node/node.go#L394

We also have another key for authenticated encryption with the validator signing process: https://github.com/tendermint/tendermint/blob/master/node/node.go#L179

@bogrod
Copy link

bogrod commented Jun 5, 2018

Hello, I'm currently working on implementing an alternative signature scheme to ECDSA for use in tendermint/cosmos which would allow people to launch tokens and other projects secured with a quantum resistant lattice-based signature scheme BLISS https://github.com/bogrod/bliss This could also be used in the future to increase security of validator keys (also looking at SDK to bring BLISS to HSMs).

The ability to use alternative signature schemes in the tendermint/cosmos ecosystem could be beneficial to long-term resilience and encourage more projects to build on the network.

I'm interested in finding a way to be compliant with BIP 32/39 while still allowing flexibility to add alternative signature schemes in the future. What are you thoughts on continuing to use the concept of key-types in go-crypto and just default to secp256k1? Or do you think this is best done as a separate library? Thanks!

@liamsi
Copy link
Contributor

liamsi commented Jun 5, 2018

Hey @bogrod 👋 that sounds pretty interesting! And yes, not only relying on secp256k1 would be neat. That said ...

What are you thoughts on continuing to use the concept of key-types in go-crypto and just default to secp256k1? Or do you think this is best done as a separate library? Thanks!

I'm currently redesigning the API and I asked myself the same question. We will definitely default to secp256k1. To be honest, I also think, we should drop the concept of key-types at least for the keys package. I do not see us moving to another curve for end-user wallets anytime soon. For the sake of separating concerns we should add any other functionality (e.g. signing with validator keys) to separate libraries/packages.

@liamsi
Copy link
Contributor

liamsi commented Jul 3, 2018

It does now via #118

@liamsi liamsi closed this as completed Jul 3, 2018
@ValarDragon
Copy link

ValarDragon commented Jul 3, 2018

To be honest, I also think, we should drop the concept of key-types at least for the keys package.

Wait, are we sure that we don't want to encode the keytype with the key? I think if we don't do that, then it will be harder to change key types in the future. I'm not suggesting supporting multiple key types / hd derivation for all keys. I'm just suggesting that it be possible to derive the key type from the encoded form. (Unless I misunderstood the above)

@liamsi
Copy link
Contributor

liamsi commented Jul 3, 2018

Hmm, currently we only support one key-type using this API (secp256k1). Bip39 currently only supports secp256k1. There are efforts to make this work with other curves (like our beloved ed25519) but this is currently not the case. Should we move this discussion to cosmos-sdk?

@ValarDragon
Copy link

Moving the discussion to a new issue in the sdk sounds good to me!

@liamsi
Copy link
Contributor

liamsi commented Jul 3, 2018

You could still switch-case over the type of the key btw (encoding the type in the seed might be dangerous as mentioned here for example: #116)

@ValarDragon
Copy link

ValarDragon commented Jul 3, 2018

Oh cool. Just wanted to make sure that amino encoded the type of the key. Since it does, that alleviates my concern of it being hard to add new key types in the future. Thanks for the clarification!

@liamsi
Copy link
Contributor

liamsi commented Jul 3, 2018

Ah OK, I see where you're coming from. Yeah, the keys are registered in the amino coded 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants