Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add peer id spec #100

Merged
merged 24 commits into from
Jun 20, 2019
Merged

docs: add peer id spec #100

merged 24 commits into from
Jun 20, 2019

Conversation

ghost
Copy link

@ghost ghost commented Oct 10, 2018

This updates the peer ID spec to explain what keypairs are supported and how peer IDs are encoded for each key type. Thanks to @Stebalien for figuring this out with me.

@ghost ghost self-assigned this Oct 10, 2018
@ghost ghost added the in progress label Oct 10, 2018
@ghost ghost requested review from jhiesey and raulk October 10, 2018 16:13
@ghost
Copy link
Author

ghost commented Oct 10, 2018

In the spirit of learning and spreading knowledge, I have chosen @jhiesey and @raulk as reviewers of this PR.

peer-ids/peer-ids.md Outdated Show resolved Hide resolved
peer-ids/peer-ids.md Outdated Show resolved Hide resolved
@Stebalien
Copy link
Member

We should probably say: Implementations SHOULD support RSA and Ed25519. Implementations MAY support Secp256k1 and ECDSA but nodes using those keys may not be able to connect to all nodes.

@marten-seemann
Copy link
Contributor

+1 for using normative language

@Stebalien Stebalien mentioned this pull request Oct 24, 2018
@ghost ghost mentioned this pull request Nov 19, 2018
16 tasks
@whyrusleeping
Copy link
Contributor

Any update here? I'd love to have some specs around here to link to ;)

@tomaka tomaka mentioned this pull request Feb 28, 2019
Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of getting our specs PRs merged, I did a review of this one. Overall, this LGTM, I just think we can delete a few sentences that specify implementation details.

## Keys


Our key pairs are stored on disk using a simple protobuf defined in [libp2p/go-libp2p-crypto/pb/crypto.proto#L5](https://github.com/libp2p/go-libp2p-crypto/blob/master/pb/crypto.proto#L5):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The specs shouldn't link to code, it should be the other way round.
  2. How keys are stored on discs doesn't need to be specified, it's an implementation decision. We only need to specify things that effect the interoperability of implements.

3. If the length of the serialized bytes <= 42, then we compute the "identity" multihash of the serialized bytes. In other words, no hashing is performed, but the [multihash format is still followed](https://github.com/multiformats/multihash) (byte plus varint plus serialized bytes). The idea here is that if the serialized byte array is short enough, we can fit it in a multihash proto without having to condense it using a hash function.
4. If the length is >42, then we hash it using it using the SHA256 multihash.

For more information, refer to this block in [libp2p/go-libp2p-peer/peer.go](https://github.com/libp2p/go-libp2p-peer/blob/master/peer.go):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I think the text already describes the logic pretty well, so we don't need to cite this comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.


Implementations SHOULD support RSA and Ed25519. Implementations MAY support Secp256k1 and ECDSA, but nodes using those keys may not be able to connect to all other nodes.

Keys are passed around in code as byte arrays. Keys are encoded within these arrays differently depending on the type of key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like an implementation decision. Remove this sentence?


To sign a message, we first hash it with SHA-256 and then sign it using the RSASSA-PKCS1-V1.5-SIGN from RSA PKCS#1 v1.5.

See [libp2p/go-libp2p-crypto/rsa.go](https://github.com/libp2p/go-libp2p-crypto/blob/master/rsa.go) for details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.


Ed25519 signatures follow the normal Ed25519 standard.

See [libp2p/go-libp2p-crypto/ed25519.go](https://github.com/libp2p/go-libp2p-crypto/blob/master/ed25519.go) for details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.


To sign a message, we hash the message with SHA 256, and then sign it with the ECDSA standard algorithm, then we encode it using DER-encoded ASN.1.

See [libp2p/go-libp2p-crypto/ecdsa.go](https://github.com/libp2p/go-libp2p-crypto/blob/master/ecdsa.go) for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.


To sign a message, we hash the message with SHA 256, then sign it using the standard Bitcoin EC signature algorithm (BIP0062), and then use standard Bitcoin encoding.

See [libp2p/go-libp2p-crypto/secp256k1.go](https://github.com/libp2p/go-libp2p-crypto/blob/master/secp256k1.go) for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

peer-ids/peer-ids.md Show resolved Hide resolved

We do not do any special additional encoding for Ed25519 public keys.

The encoding for Ed25519 private keys is a little unusual. There are two formats that we encourage implementors to support:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an implementation decision, so we probably don't need to specify it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely. We do want users to be able to port keys from one implementation to another.

}
```

As should be apparent from the above code block, this proto simply encodes for transmission a public/private key pair along with an enum specifying the type of keypair.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any situation where we want to transmit the PrivateKey? That seems... dangerous. If not, we don't need to specify the PrivateKey here at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, storage of private key is implementation specific, so no need to cover them in this doc I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, users do need to be able to take their private keys with them (especially because we use these for things like IPNS).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that removing the private key format from this doc leaves a gap. We still need to specify somewhere how we handle them.

We could bring back the private key references and add a call-out at the top of the doc that they're not related to peer-id calculation and are shown for reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, we should probably rename this doc to the "libp2p key spec" and make peer ID calculation a part of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for that

peer-ids/peer-ids.md Outdated Show resolved Hide resolved
1. Encode the public key into the protobuf.
2. Serialize the protobuf containing the public key into bytes using the [canonical protobuf encoding](https://developers.google.com/protocol-buffers/docs/encoding).
3. If the length of the serialized bytes <= 42, then we compute the "identity" multihash of the serialized bytes. In other words, no hashing is performed, but the [multihash format is still followed](https://github.com/multiformats/multihash) (byte plus varint plus serialized bytes). The idea here is that if the serialized byte array is short enough, we can fit it in a multihash proto without having to condense it using a hash function.
4. If the length is >42, then we hash it using it using the SHA256 multihash.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should say something about how these are commonly represented as strings: base58btc encoding raw, without using multibase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bit about base58btc, but didn't mention multibase, since we hadn't defined it yet in the doc. Should I bring it up? I think if people are likely to expect Peer Ids to use multibase we should clarify.

3. If the length of the serialized bytes <= 42, then we compute the "identity" multihash of the serialized bytes. In other words, no hashing is performed, but the [multihash format is still followed](https://github.com/multiformats/multihash) (byte plus varint plus serialized bytes). The idea here is that if the serialized byte array is short enough, we can fit it in a multihash proto without having to condense it using a hash function.
4. If the length is >42, then we hash it using it using the SHA256 multihash.

For more information, refer to this block in [libp2p/go-libp2p-peer/peer.go](https://github.com/libp2p/go-libp2p-peer/blob/master/peer.go):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

peer-ids/peer-ids.md Outdated Show resolved Hide resolved
}
```

As should be apparent from the above code block, this proto simply encodes for transmission a public/private key pair along with an enum specifying the type of keypair.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, storage of private key is implementation specific, so no need to cover them in this doc I think.

@raulk
Copy link
Member

raulk commented Mar 13, 2019

@yusefnapora if you wanna do some spec herding, this is a quick win I think. Pretty good consensus.

@yusefnapora
Copy link
Contributor

I did a quick edit to remove references to private keys and serialization on disk. I also removed links to go code and added some links to e.g. the RSA signing spec, etc.

@mgoelzer do you mind if I push changes to this branch? I put up a new one here with the edits: https://github.com/libp2p/specs/blob/edit/peer-ids/peer-ids/peer-ids.md

but it might be easier to discuss if I drop the commits here.

@raulk
Copy link
Member

raulk commented Mar 14, 2019

@yusefnapora gonna jump in and say yes. In the interest of moving forward, push to the branch in this PR. Thanks!

@ghost ghost assigned yusefnapora Mar 14, 2019
Co-Authored-By: Stebalien <steven@stebalien.com>
Adds "encode to byte array according to rules below" as first step,
and makes explicit that we only use the public part of the keypair.
@yusefnapora
Copy link
Contributor

Sorry for sleeping on this for a while everyone :)

I added a few commits to address some feedback. I think the most significant is the note about deterministic protobuf encoding. It basically says determinism is "desirable" and you should try to make it happen, but doesn't call it out as a MUST. Without requiring a bunch of changes to the protobuf spec, that might be the best we can do, but if anyone has a better way to put this, I'm all ears.

@yusefnapora
Copy link
Contributor

@Stebalien @arnetheduck @raulk - could you guys help me figure out the resolution to the deterministic encoding problem?

If we definitely want to require a consistent / canonical encoding for peer ids, then I think I should write up a precise spec that requires a certain field ordering, etc. And we can have some tests that ensure your encoder handles edge cases well.

But @Stebalien mentions that, because we're also not guaranteeing a canonical encoding for the key Data field, it's not really relevant at this layer. And that it's fine to have multiple valid peer ids for the same key, which seems contentious.

I can make up some arguments in favor of this view; for one, we can extend the PublicKey message in the future without having to decide now how to encode unknown fields in our "not-quite-protobuf" format. It could also be up to an application to decide whether to reject multiple peer ids that all derive from the same public key. It's also much simpler, of course, since the easiest problem to solve is the one you don't have.

@Stebalien could you elaborate a bit on your view? I think we should figure out if this is a blocker or not to merging the spec.

@raulk
Copy link
Member

raulk commented May 20, 2019

On mobile.

Re: deterministic key serde. I suggest we specify the format as proto3 + the extra requirements to reach a deterministic result (ordering, no unrecognised fields, no duplicates last wins, etc.) We should add an implementers note in the form of a SHOULD recommendation to use an OOTB protobuf encoder, where possible, and provide test vectors.

For cases where that’s unfeasible, we should provide a boiled down serde spec in BNF or similar form. It’ll be super simple, the schema is so short and constrained we can express the serialised form manually without alluding to proto3 at all.

Re: using unrecognised fields for peer ID calculation, I’d like to hear what use cases you had in mind @Stebalien. In my view, the peer ID should be derived from the pubkey modulo metadata, if any. I don’t think user-defined metadata should yield a different identity. Seems like opening a trivial attack vector for sybils.

@Stebalien
Copy link
Member

In my view, the peer ID should be derived from the pubkey modulo metadata, if any. I don’t think user-defined metadata should yield a different identity.

Note: I'm really not expecting much if any user defined metadata. However, we may want to add new fields in the future and it's hard to do this if we don't include them in the hash.

  1. We (IPFS/IPLD) would like to to be able to convert peer IDs to CIDs (and, ideally, fetch keys over bitswap).
  2. Currently, peer IDs always map to exactly one public key. If we both allow other metadata and don't hash it, we'll need to handle merging metadata.
  3. The un-hashed metadata won't be authenticated. When we connect to a peer, they send us their key. We then compute the peer ID of that key and check if it matches the expected one. If we ignore metadata, we could end up with inconsistent metadata views.

I don’t think user-defined metadata should yield a different identity. Seems like opening a trivial attack vector for sybils.

There are two cases:

  • The key's owner generates multiple identities. This is a non-issue, they can already generate more keys.
  • An attacker generates multiple identities for a victim: We'll detect this on connect because we'll compute the victim's peer ID from the key they give us. Ideally, keys would be self signed but our use-case for these keys is simple enough that this probably isn't that big of an issue (we should keep using this system for identifying libp2p peers but should come up with something better for identity).

@Stebalien
Copy link
Member

But @Stebalien mentions that, because we're also not guaranteeing a canonical encoding for the key Data field, it's not really relevant at this layer. And that it's fine to have multiple valid peer ids for the same key, which seems contentious.

Same Key

IMO, "same key" should mean "same bytes". That is, if I change anything about the bytes of the serialized key, I get a new key.

I'm concerned that not all key formats will have a "canonical" encoding, some libraries may strip certain metadata while others preserve it, etc. This will lead to hard-to-track-down bugs. This has already been a real pain for us in IPLD and our solution there is to avoid re-serializing unless we change something.

My unconcern with having multiple valid peer IDs for the same underlying cryptographic key is that there's likely nothing we can do to stop this. I haven't audited our key formats/algorithms but it's likely possible to make some small changes to a public key and have it continue to work with the private key.

@yusefnapora
Copy link
Contributor

@raulk, what do you think of the header in 6c4a587 - I think the markdown table is a decent way to present the main status information.

I figure we can hammer out details here and I'll write up a little spec for the header format once we like it.

@yusefnapora
Copy link
Contributor

Also, @raulk, @vyzo & @Stebalien I nominated you guys as the Interest Group for this one 😄

Others are welcome to join in if they like

@Zolmeister
Copy link

I would like to reiterate the value in migrating to embedded-key base32 encoded canonical representation of PeerIDs
Ref. #139

@yusefnapora
Copy link
Contributor

How do we feel about merging this?

870b71a kicks the deterministic encoding issue down the road by saying a future version of the spec may require more strict encoding than the protobuf spec, but until then, don't extend the PublicKey message and use a protobuf encoder that writes fields in order.

@mgoelzer @Stebalien @raulk @vyzo

@folex
Copy link

folex commented May 4, 2023

Sorry for commenting on an old PR, but I have not found an answer anywhere else.

What's the motivation for number 42 here?

keys that serialize to at most 42 bytes must be hashed using the "identity" multihash codec.

When one would expect keys to be longer than 42 bytes?

Thank you!

@Winterhuman
Copy link

Winterhuman commented May 5, 2023

I believe it's the size of ed25519 public keys inside of protobuf encoding, since ed25519 keys are supposed to be inlined into PeerIDs, so any keypair algorithm with public keys larger than 32 bytes would be encoded as more than 42 bytes and therefore not be inlined

@folex
Copy link

folex commented May 5, 2023

uh, so it's 32 + 10 bytes.

30% for serializaton is such a big number, I didn't even think about that.

thank you, that makes it clearer🙏

@mxinden
Copy link
Member

mxinden commented May 8, 2023

And I vaguely remember that 42 would be small enough to then fit into a single DNS segment. Though I might be misremembering that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.