Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add peer id spec #100

Merged
merged 24 commits into from
Jun 20, 2019
Merged
Changes from 6 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
714a6c7
docs: add peer id spec
Oct 10, 2018
565767a
docs: clean up writing
Oct 10, 2018
902fbfe
docs: fix @vyzo comment
Oct 10, 2018
95c2354
docs: syntax highlighting
Oct 10, 2018
d8459bc
Key types should/may
Oct 11, 2018
e2dfbe2
clarify 42 byte rule
Oct 12, 2018
6c318c9
remove references to private keys & storage formats
yusefnapora Mar 14, 2019
878f2fa
remove links to go impl, add links to specs
yusefnapora Mar 14, 2019
eda2295
peer ids: language nit
raulk Mar 15, 2019
242afbe
bring back private keys, add context about serialization
yusefnapora Mar 19, 2019
52057ac
base key types MUST be supported
yusefnapora Mar 19, 2019
3302991
peer id: implementations may configure key types
raulk Mar 27, 2019
f277f41
note that we're using proto2
yusefnapora May 8, 2019
046c7e8
soon has come :)
yusefnapora May 8, 2019
9bfb370
mention we're not using multibase for peer-ids
yusefnapora May 8, 2019
a7de2f6
tweak the description of peer id generation
yusefnapora May 8, 2019
1237100
add note about deterministic encoding of PublicKey protobuf
yusefnapora May 8, 2019
870b71a
revise note about deterministic encoding
yusefnapora May 22, 2019
d14a44d
update status & generate TOC
yusefnapora May 22, 2019
10043ec
fix TOC
yusefnapora May 22, 2019
6c4a587
update status header
yusefnapora May 23, 2019
2ec0867
use shortcut reference links for authors in header
yusefnapora May 27, 2019
5173834
Merge master into feat/peer-ids
yusefnapora Jun 19, 2019
ed01eb1
add peer id spec to index
yusefnapora Jun 19, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions peer-ids/peer-ids.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Spec: Peer Ids and Keys

## Keys


Our key pairs are stored on disk using a simple protobuf defined in [libp2p/go-libp2p-crypto/pb/crypto.proto#L5](https://github.com/libp2p/go-libp2p-crypto/blob/master/pb/crypto.proto#L5):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The specs shouldn't link to code, it should be the other way round.
  2. How keys are stored on discs doesn't need to be specified, it's an implementation decision. We only need to specify things that effect the interoperability of implements.


```protobuf
enum KeyType {
RSA = 0;
Ed25519 = 1;
Secp256k1 = 2;
ECDSA = 3;
}

message PublicKey {
required KeyType Type = 1;
required bytes Data = 2;
}

message PrivateKey {
required KeyType Type = 1;
required bytes Data = 2;
}
```

As should be apparent from the above code block, this proto simply encodes for transmission a public/private key pair along with an enum specifying the type of keypair.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any situation where we want to transmit the PrivateKey? That seems... dangerous. If not, we don't need to specify the PrivateKey here at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, storage of private key is implementation specific, so no need to cover them in this doc I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, users do need to be able to take their private keys with them (especially because we use these for things like IPNS).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that removing the private key format from this doc leaves a gap. We still need to specify somewhere how we handle them.

We could bring back the private key references and add a call-out at the top of the doc that they're not related to peer-id calculation and are shown for reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, we should probably rename this doc to the "libp2p key spec" and make peer ID calculation a part of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for that


#### Where it's used?

Keys are used in two places in libp2p. The first is for signing messages. Here are some examples of messages we sign:
- IPNS records
- PubSub messages (coming soon)
- SECIO handshake

The second is for generating peer ids; this is discussed in the section below.

## Peer Ids

Here is the process by which we generate peer id's based on the public/private keypairs described above:

1. Encode the public key into the protobuf.
2. Serialize the protobuf containing the public key into bytes using the [canonical protobuf encoding](https://developers.google.com/protocol-buffers/docs/encoding).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1=0,2=bytes and 2=bytes,1=0 are two equally valid encodings for an RSA key. Indeed, so is 1=2,2=bytes,1=0 given that the last value overrides the first, per the protobuf spec. Protobuf is not deterministic in its encoding - this would be a problem in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's true. From the linked doc:

When a message is serialized, there is no guaranteed order for how its known or unknown fields should be written.

Are we doing anything to enforce deterministic field ordering? If not, maybe we should be...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes about the lack of deterministic serialization for protobuf: https://gist.github.com/kchristidis/39c8b310fd9da43d515c4394c3cd9510

This is a thorny issue... a libp2p implementor could do everything according to the spec and still end up with peer ids that other peers wouldn't validate, if their protobuf implementation happened to order the keys differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps to think of a protobuf message as a hash map, essentially - semantically it's a very similar situation - the last thing you put in remains, and the order is unspecified.

In fact, that's cheap and easy way to implement it - I'd expect dynamic languages with strong dictionary implementations to do it this way.

Locking down field order solves the issue, but I'm not sure I'd call it protobuf any more at that point - perhaps protobuf-compatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not crazy about requiring semantics on top of protobuf - at that point I'd rather just specify a different serialization that is deterministic. Then you have a backwards compatibility problem though...

@marten-seemann do you have any concerns about the non-deterministic encoding for this public key structure? Asking because it gets embedded in the certificate extension for the TLS 1.3 spec you're working on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I interpret this line, specially since it's explicitly stated in their spec, is that regardless of what implementations happen to doing today, there's no guarantee that they will do so tomorrow.

Specifically, it says:

Serialization order is an implementation detail and the details of any particular implementation may change in the future.

(i.e., implementation defined)

Annoyingly, they actually changed this. It used to say:

While you can use field numbers in any order in a .proto, when a message is serialized its known fields should be written sequentially by field number, as in the provided C++, Java, and Python serialization code.

Which is why we relied on this. All fully-featured and correct implementations ensured their field orders matched.

in protobuf 3 there's no distinction between a default value (0) and an explicitly set value

We currently require proto2 (these fields are actually marked as required which doesn't exist in proto3) but yeah, implementers may ignore this.

One of the reasons I've seen people use protobuf is to have the ability to re-serialize - it was a design goal of the language, and manifests itself as the "UnknownFields" map in the C++ API, for example. Prohibiting this takes the spec another step away from this being protobuf. Granted, it's likely not the first thing someone will want to do with a peer id specifically..

IMO, that's a design goal when editing. Any edits will change the peer ID anyways.


My worry with saying "this is not protobuf but you can use a protobuf decoder" is that people will use protobuf encoders anyways because they'll usually "just work". Then, when the underlying protobuf implementation changes, bad things will happen.

Copy link
Member

@raulk raulk Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien

My worry with saying "this is not protobuf but you can use a protobuf decoder" is that people will use protobuf encoders anyways because they'll usually "just work". Then, when the underlying protobuf implementation changes, bad things will happen.

We can include test vectors for that. It's fair for implementors to use protobuf encoders, and I expect that to be the dominant implementation approach. However, if between version X and Y a specific encoder changed in a way that it broke our requirement, that's when the implementor would have to craft a manual encoding. It's a fair trade-off. We can provide guidance in this spec (in the form of "implementor's notes").

https://web.archive.org/web/20140215063318/https://developers.google.com/protocol-buffers/docs/encoding#order
Which is why we relied on this. All fully-featured and correct implementations ensured their field orders matched.

Woah, thanks for tracing that back 😓

Either way but I would like to be able to add additional fields in the future (which is problematic with respect to re serializing).

IIUC, you'd like nodes to tolerate unrecognised fields and use the serialised form verbatim, as transmitted by the other party, when calculating the peer ID?

Isn't this dangerous? This creates a surface for polymorphic peer IDs, where node A with pubkey P could produce infinite peer ID (or craft specific ones) for itself – and hence break protocols that rely on peer IDs – merely by setting random fields in the protobuf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this dangerous? This creates a surface for polymorphic peer IDs, where node A with pubkey P could produce infinite peer ID (or craft specific ones) for itself – and hence break protocols that rely on peer IDs – merely by setting random fields in the protobuf.

This should be just fine.

  1. You can probably already do this. We don't make any effort to ensure that the data part of the key is completely deterministic.
  2. Any change to the key metadata will, effectively, create a new identity. IMO, that's exactly what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't make any effort to ensure that the data part of the key is completely deterministic.

Could you elaborate?

Any change to the key metadata will, effectively, create a new identity. IMO, that's exactly what we want.

That feels wrong. Identity should be derived from cryptographic material (a pubkey), not from random metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure exactly what @Stebalien is referring to about the data field, but the spec describes one case where the same public key could lead to different data fields (and thus peer ids).

Since there are two supported methods for encoding Ed25519 keys, you could end up with different peer ids for the same key depending on which you choose.

3. If the length of the serialized bytes <= 42, then we compute the "identity" multihash of the serialized bytes. In other words, no hashing is performed, but the [multihash format is still followed](https://github.com/multiformats/multihash) (byte plus varint plus serialized bytes). The idea here is that if the serialized byte array is short enough, we can fit it in a multihash proto without having to condense it using a hash function.
Stebalien marked this conversation as resolved.
Show resolved Hide resolved
4. If the length is >42, then we hash it using it using the SHA256 multihash.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should say something about how these are commonly represented as strings: base58btc encoding raw, without using multibase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bit about base58btc, but didn't mention multibase, since we hadn't defined it yet in the doc. Should I bring it up? I think if people are likely to expect Peer Ids to use multibase we should clarify.


For more information, refer to this block in [libp2p/go-libp2p-peer/peer.go](https://github.com/libp2p/go-libp2p-peer/blob/master/peer.go):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I think the text already describes the logic pretty well, so we don't need to cite this comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.


```
// MaxInlineKeyLength is the maximum length a key can be for it to be inlined in
// the peer ID.
//
// * When `len(pubKey.Bytes()) <= MaxInlineKeyLength`, the peer ID is the
// identity multihash hash of the public key.
// * When `len(pubKey.Bytes()) > MaxInlineKeyLength`, the peer ID is the
// sha2-256 multihash of the public key.
const MaxInlineKeyLength = 42
```


## How Keys are Encoded and Messages Signed

Four key types are supported:
- RSA
- Ed25519
- Secp256k1
- ECDSA

Implementations SHOULD support RSA and Ed25519. Implementations MAY support Secp256k1 and ECDSA, but nodes using those keys may not be able to connect to all other nodes.
raulk marked this conversation as resolved.
Show resolved Hide resolved

Keys are passed around in code as byte arrays. Keys are encoded within these arrays differently depending on the type of key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like an implementation decision. Remove this sentence?


The following sections describe each key type's encoding rules.

### RSA

We encode the public key using the DER-encoded PKIX format.

We encode the private key as a PKCS1 key using ASN.1 DER.

To sign a message, we first hash it with SHA-256 and then sign it using the RSASSA-PKCS1-V1.5-SIGN from RSA PKCS#1 v1.5.

See [libp2p/go-libp2p-crypto/rsa.go](https://github.com/libp2p/go-libp2p-crypto/blob/master/rsa.go) for details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.


### Ed25519

Ed25519 specifies the exact format for keys and signatures, so we do not do much additional encoding, except as noted below.

We do not do any special additional encoding for Ed25519 public keys.

The encoding for Ed25519 private keys is a little unusual. There are two formats that we encourage implementors to support:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an implementation decision, so we probably don't need to specify it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely. We do want users to be able to port keys from one implementation to another.


- Preferred method is a simple concatenation: `[private key bytes][public key bytes]` (64 bytes)
- Older versions of the libp2p code used the following format: `[private key][public key][public key]` (96 bytes). If you encounter this type of encoding, the proper way to process it is to compare the two public key strings (32 bytes each) and verify they are identical. If they are, then proceed as you would with the preferred method. If they do not match, reject or error out because the byte array is invalid.

Ed25519 signatures follow the normal Ed25519 standard.

See [libp2p/go-libp2p-crypto/ed25519.go](https://github.com/libp2p/go-libp2p-crypto/blob/master/ed25519.go) for details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.


### Secp256k1

We use the standard Bitcoin EC encoding for Secp256k1 public and private keys.
bigs marked this conversation as resolved.
Show resolved Hide resolved

To sign a message, we hash the message with SHA 256, then sign it using the standard Bitcoin EC signature algorithm (BIP0062), and then use standard Bitcoin encoding.

See [libp2p/go-libp2p-crypto/secp256k1.go](https://github.com/libp2p/go-libp2p-crypto/blob/master/secp256k1.go) for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.


### ECDSA

We encode the public key using ASN.1 DER.

We encode the private key using DER-encoded PKIX.

To sign a message, we hash the message with SHA 256, and then sign it with the ECDSA standard algorithm, then we encode it using DER-encoded ASN.1.

See [libp2p/go-libp2p-crypto/ecdsa.go](https://github.com/libp2p/go-libp2p-crypto/blob/master/ecdsa.go) for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.