Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 0001: Text Peer Ids as CIDs #209

Merged
merged 3 commits into from
Oct 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions RFC/0001-text-peerid-cid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
- Start Date: 2019-08-15
- Related issues: [go-ipfs/issues/5287](https://github.com/ipfs/go-ipfs/issues/5287), [multicodec/issues/130](https://github.com/multiformats/multicodec/issues/130), [go-libp2p-core/pull/41](https://github.com/libp2p/go-libp2p-core/pull/41)

# RFC 0001: Text Peer Ids as CIDs

## Abstract

This is an RFC to modify Peer Id spec to alter the default string representation
from Multihash to CIDv1 in Base32 and to support encoding/decoding text Peer Ids as CIDs.

[ipld-cid-spec]: https://github.com/ipld/cid

## Motivation

1. Current text representation of Peer Id ([multihash][multihash] in [Base58btc][base58btc]) is case-sensitive.
This means we can't use it in case-insensitive contexts such as domain names ([RFC1035][rfc1035] + [RFC1123][rfc1123]) or [FAT](fat) filesystems.
2. [CID][ipld-cid-spec] provide [multibase][multibase] support and `base32`
makes a [safe default][cidv1b32-move] that will work in case-insensitive contexts,
enabling us to put Peer Ids [in domains][cid-in-subdomains] or create files with Peer Ids as names.
3. It's much easier to upgrade wire protocols than text.
This RFC makes Peer Ids in text form fully self describing, making them more future-proof.
A dedicated [multicodec][multicodec] in text-encoded CID will indicate that [it's a hash of a libp2p public key][libp2p-key-multicodec].

[rfc1035]: http://tools.ietf.org/html/rfc1035
[rfc1123]: https://tools.ietf.org/html/rfc1123
[multibase]: https://github.com/multiformats/multibase/
[multicodec]: https://github.com/multiformats/multicodec
[multihash]: https://github.com/multiformats/multihash
[cid-in-subdomains]: https://github.com/ipfs/in-web-browsers/issues/89
[libp2p-key-multicodec]: https://github.com/multiformats/multicodec/issues/130
[cidv1b32-move]: https://github.com/ipfs/ipfs/issues/337
[base58btc]: https://en.bitcoinwiki.org/wiki/Base58#Alphabet_Base58
[fat]: https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system

## Detailed design

1. Switch text encoding and decoding of Peer Ids from Multihash to [CID][ipld-cid-spec].
2. The new text representation should be CIDv1 with additional requirements:
- MUST have [multicodec][multicodec] set to `libp2p-key` (`0x72`)
- SHOULD have [multibase][multibase] set to `base32` (Base32 without padding, as specified by [RFC4648][rfc4648])

[rfc4648]: https://tools.ietf.org/html/rfc4648

lidel marked this conversation as resolved.
Show resolved Hide resolved
### Upgrade path

1. Release support for reading Peer Id represented with CIDv1
2. Wait three months or until the next release (whichever comes first)
3. Switch the default Peer Id output format to CIDv1 in Base32

### Backward compatibility

The old text representation (Multihash encoded as [`base58btc`][base58btc])
is a valid CIDv0 and does not require any special handling.

[base58btc]: https://en.bitcoinwiki.org/wiki/Base58#Alphabet_Base58

## Alternatives

We could just add a [multibase][multibase] prefix to multihash, but that requires more work and introduces a new format.
This option was rejected as using CID enables reuse of existing serializers/deserializers and does not create any new standards.

## Unresolved questions

This RFC punts pids-as-cids on the wire down the road but that's something we can revisit if it ever becomes relevant.

[go-libp2p-core-41]: https://github.com/libp2p/go-libp2p-core/pull/41
[libp2p-specs-111]: https://github.com/libp2p/specs/issues/111
67 changes: 52 additions & 15 deletions peer-ids/peer-ids.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@

| Lifecycle Stage | Maturity Level | Status | Latest Revision |
|-----------------|----------------|--------|-----------------|
| 3A | Recommendation | Active | r0, 2019-05-23 |
| 3A | Recommendation | Active | r1, 2019-08-15 |


**Authors**: [@mgoelzer], [@yusefnapora]
**Authors**: [@mgoelzer], [@yusefnapora], [@lidel]

**Interest Group**: [@raulk], [@vyzo], [@Stebalien]

Expand All @@ -14,6 +13,7 @@
[@raulk]: https://github.com/raulk
[@vyzo]: https://github.com/vyzo
[@Stebalien]: https://github.com/Stebalien
[@lidel]: https://github.com/lidel

See the [lifecycle document](../00-framework-01-spec-lifecycle.md) for context
about maturity level and spec status.
Expand Down Expand Up @@ -53,7 +53,7 @@ Key encodings and message signing semantics are

## Keys

Our key pairs are wrapped in a [simple protobuf](https://github.com/libp2p/go-libp2p-crypto/blob/master/pb/crypto.proto),
Our key pairs are wrapped in a [simple protobuf](https://github.com/libp2p/go-libp2p-crypto/blob/master/pb/crypto.proto),
defined using the [Protobuf version 2 syntax](https://developers.google.com/protocol-buffers/docs/proto):

```protobuf
Expand Down Expand Up @@ -107,7 +107,7 @@ Here is the process by which we generate peer ids based on the public component
3. Serialize the protobuf containing the public key into bytes using the [canonical protobuf encoding](https://developers.google.com/protocol-buffers/docs/encoding).
4. If the length of the serialized bytes <= 42, then we compute the "identity" multihash of the serialized bytes. In other words, no hashing is performed, but the [multihash format is still followed](https://github.com/multiformats/multihash) (byte plus varint plus serialized bytes). The idea here is that if the serialized byte array is short enough, we can fit it in a multihash verbatim without having to condense it using a hash function.
5. If the length is >42, then we hash it using it using the SHA256 multihash.

### Note about deterministic encoding

Deterministic encoding of the `PublicKey` message is desirable, as it ensures
Expand All @@ -131,16 +131,54 @@ behavior.

### String representation

Peer Ids are multihashes, and they are often encoded into strings.
The canonical string representation of a Peer Id is a base58 encoding with
[the alphabet used by bitcoin](https://en.bitcoinwiki.org/wiki/Base58#Alphabet_Base58).
This encoding is sometimes abbreviated as `base58btc`.
Peer Ids are [multihashes][multihash] canonically represented with [CIDs](https://github.com/ipld/cid) when encoded into strings.

Encoding and decoding of string representation MUST follow [CID specification][cid-decoding].

Implementations parsing IDs from text MUST support both base58 CIDv0 and CIDv1 in base32, and they MUST generate base32-encoded CIDv1 by default. Generating CIDv0 is allowed as an opt-in (behind a flag).

CIDv0 is a multihash encoded in Base58.
CIDv1 is a multihash with a prefix that specifies things like base encoding, cid version and the type of data behind it:

```
<cidv1> ::= <multibase><cid-version><multicodec><multihash>
```

#### libp2p-key CID
lidel marked this conversation as resolved.
Show resolved Hide resolved

The canonical string representation of a Peer Id is a CID v1
with `base32` [multibase][multibase] ([RFC4648](https://tools.ietf.org/html/rfc4648), without padding) and `libp2p-key` [multicodec][multicodec]:

An example of a `base58btc` encoded SHA256 peer id: `QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N`.
| multibase | cid version | multicodec |
| --------- | ----------- | ------------ |
| `base32` | `1` | `libp2p-key` |

Note that some projects using libp2p will prefix "base encoded" strings with a
[multibase](https://github.com/multiformats/multibase) code that identifies the encoding base and alphabet.
Peer ids do not use multibase, and can be assumed to be encoded as `base58btc`.
- `libp2p-key` multicodec is mandatory when serializing to text (ensures Peer Id is self-describing)
- `base32` is the default multibase encoding: projects are free to use a different one if it is more suited to their needs

##### Decoding string representation

To decode a CID, follow the following algorithm:

- If it is 46 characters long and starts with `Qm...`, it's a CIDv0. Decode it as base58btc multihash.
- Otherwise, decode it according to the multibase and [CID spec][cid-decoding].


Examples:

- SHA256 Peer Id encoded as canonical [CIDv1][cid-versions]:
`bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe` ([inspect](http://cid.ipfs.io/#bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYSA "inspect" link requires multiformats/cid-utils-website#14 to work correctly

- Peer Ids that do not start with a valid multibase prefix are assumed to be legacy [CIDv0][cid-versions]
(a multihash with implicit [`base58btc`][base58btc] encoding, without any prefix).
An example of the same Peer Id as a legacy CIDv0: `QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N`


[multihash]: https://github.com/multiformats/multihash
[multicodec]: https://github.com/multiformats/multicodec
[multibase]: https://github.com/multiformats/multibase
[base58btc]: https://en.bitcoinwiki.org/wiki/Base58#Alphabet_Base58
[cid-decoding]: https://github.com/multiformats/cid#decoding-algorithm
[cid-versions]: https://github.com/multiformats/cid#versions

## How Keys are Encoded and Messages Signed

Expand All @@ -152,7 +190,7 @@ Four key types are supported:

Implementations MUST support RSA and Ed25519. Implementations MAY support Secp256k1 and ECDSA, but nodes using those keys may not be able to connect to all other nodes.

In all cases, implementation MAY allow the user to enable/disable specific key types via configuration.
In all cases, implementation MAY allow the user to enable/disable specific key types via configuration.
Note that disabling support for compulsory key types may hinder connectivity.

Keys are encoded into byte arrays and serialized into the `Data` field of the
Expand Down Expand Up @@ -204,4 +242,3 @@ We encode the public key using ASN.1 DER.
We encode the private key using DER-encoded PKIX.

To sign a message, we hash the message with SHA 256, and then sign it with the [ECDSA standard algorithm](https://tools.ietf.org/html/rfc6979), then we encode it using [DER-encoded ASN.1.](https://wiki.openssl.org/index.php/DER)