Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeerId string identifier #680

Open
wemeetagain opened this issue Jun 19, 2020 · 12 comments
Open

PeerId string identifier #680

wemeetagain opened this issue Jun 19, 2020 · 12 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/ready Ready to be worked

Comments

@wemeetagain
Copy link
Member

Type:

Question

Description:

Currently, js-libp2p uses PeerId objects to identify peers, which are compared using equals method and printed with toB58String. In many cases, the b58 string is used to index peers (eg: in Maps, Objects, Sets, etc) and there, implicitly used for id equality.

Has any thought been given to using b58 string encoded peer-ids as the canonical peer identifier throught the codebase? In this case, the keystore would be more heavily relied on to retrieve public/private keys since they wouldn't be (optionally) attached as in PeerId objects. In many cases, checking the validity of a string identifier would still be required, but strings are convenient to use for indexing and equality checking. Perhaps this may also result in general speedups and reduced memory usage, as string equality checking is generally faster than Buffer equality checking and public keys can all be stored in one place.

I definitely haven't thought thru the intricacies, just more an idea after seeing lots of id.toB58String() in my own code, and seeing that peer.ID is also a string in go-libp2p.

Curious more than anything if this is being considered or on the roadmap, and if this has been discussed before.

@vasco-santos
Copy link
Member

This seems a good idea! Totally in favour of moving on a direction of using a string identifier and rely on the PeerStore.
We plan to move away from the id.toB58String() in favour of id.toString(). When we work on this change, I think we should definitely consider this suggestion.

@vasco-santos vasco-santos added the kind/enhancement A net-new feature or improvement to an existing feature label Jun 21, 2020
@jacobheun jacobheun added the status/ready Ready to be worked label Jul 9, 2020
@wemeetagain
Copy link
Member Author

I noticed, profiling lodestar, that requesting peers from the peer store (calling libp2p.peerStore.peers) is not very performant, taking roughly 500ms per call (with roughly 50 or so peers).
It appears to mainly be a result of creating PeerId objects.

get-peers-flame

I'd expect that resolving this issue in the direction of using string peer-ids directly will result in a significant performance improvement.

@vasco-santos
Copy link
Member

Thanks for the data @wemeetagain
So, once we use peerIdStr everywhere we will be able to return also the string in the peerStore.peers.

However, we should look into what is causing this performance degradation. The bottleneck should probably be in https://github.com/libp2p/js-peer-id/blob/master/src/index.js#L27 creating the b58string which we already have, or creating the cid in https://github.com/libp2p/js-peer-id/blob/master/src/index.js#L248 . Both have sync encodings/decodings.

@jacobheun do you think we can get rid of b58string usage by default in all libp2p modules for 0.31?

@jacobheun
Copy link
Contributor

However, we should look into what is causing this performance degradation.

Yes, we need to get more benchmarks in place and run those regularly (not necessarily as PRs, perhaps daily crons). The fact that this is reassembling the data from the individual books each time is problematic, and the code is doing a lot of iteration on the data to reassemble it.

@jacobheun do you think we can get rid of b58string usage by default in all libp2p modules for 0.31?

@vasco-santos yes, we can internally change or base (likely base36 based on subdomain usefulness) and just start using toString(). If applications need to b58 for some reason, they can do the conversion before printing.

@dapplion
Copy link
Contributor

dapplion commented Mar 24, 2021

See this CPU profile showing the bottleneck in performance.

Screenshot from 2021-03-24 09-08-26

Screenshot from 2021-03-24 09-08-48

@dapplion
Copy link
Contributor

After refactoring peer managment in Lodestar I strongly believe moving from a class PeerId to a string as canonical identifier should be a priority. It would simplify the codebase significantly and improve performance omitting costly serialization and deserialization.

@BigLep
Copy link
Contributor

BigLep commented Jun 28, 2022

2022-06-28 conversation: in general, it has been intentional for js-libp2p to not just strings to avoid ambiguity around peerIds, multiaddrs, etc. There have been improvements in recent js-libp2p releases. Another look at the Lodestar performance impacts will be taken once ChainSafe/lodestar#4114 is addressed.

@GlenDC
Copy link

GlenDC commented Jan 6, 2023

I am not sure if related, but if I have code such as the following:

const node = await createLibp2p({
    // ...
})

// ...

client.publish('peers', { id: node.peerId, addrs: listenAddrs })
const peers = []
const peerTopic = await client.subscribe('peers')
for (let i = 0; i < runenv.testInstanceCount; i++) {
    const result = await peerTopic.wait.next()
    peers.push(result.value)
}
peerTopic.cancel()

I can only compare the peerIds (to make sure I do not dial to my own node for example) by using == rather then the tripple variant (===):

peers[0].id == node.peerId // true
peers[0].id === node.peerId // false

I suppose because I might need to convert it back to a type from a string, when receiving it from over the network?

@wemeetagain
Copy link
Member Author

Related to the concerns about avoiding ambiguity around peer ids, multiaddrs, other string types:

Some discussion around "nominal types" that provides some insights: microsoft/TypeScript#202

We may use "branded types" to differentiate between various string types and functionally achieve nominal types.
As an example, see https://github.com/ChainSafe/ts-peer-id/blob/master/src/index.ts
It achieves code that works like so:

import {PeerIdString, validateString} from '@chainsafe/ts-peer-id'

const str = '...'

function doWithPeerId(id: PeerIdString): void {...}

doWithPeerId(str) // type error, `string` doesn't satisfy `PeerIdString`

validateString(str) // asserts that the `str` is of type `PeerIdString`

doWithPeerId(str) // now there's no type error

const str2 = '...' as PeerIdString // alternatively, a string can be explicitly cast to `PeerIdString`

@achingbrain
Copy link
Member

I'm doing some memory analysis of processes under heavy load, I think this might be worth experimenting with - and should be extended to multiaddrs and CIDs too. We keep a lot of Uint8Arrays in memory which are famously inefficient - JS heap size is under control but the overall RSS of the process increases until it dies. Storing these types as strings should help here, though CPU usage may go up when we need to operate on these types & it'll be a breaking change.

@achingbrain
Copy link
Member

achingbrain commented Apr 9, 2024

One thing we could do that might be a better middle ground, is to have the PeerId/Multiaddr objects store strings internally and not Uint8Arrays - at the moment we store the PeerId multihash (which contains a byte array) and the Multiaddr as a string/byte array/tuples/string tuples.

This would be a non-breaking change and should be lighter on memory use as we'd not be storing Uint8Arrays in long-lived objects. It still gives us type safety and is a lot less disruptive than changing everything to strings.

@SgtPooki
Copy link
Member

JS Colo 2024 rough discussion
  • PeerId interface has two multihash

what do we use more often, string version of peerid or multihash version of peerId?

  • when we use the public key, we pull it out of the multihash digest, which contains the protobuf with the keytype and the key, when we use protons to read that out, it returns a subarray of the main array so we're not copying.

what we could do:

  1. deploy two bootstrappers & compare CPU and mem usage
    • one where peerId stores the multihash as a multihash digest object (current)
    • one where it stores the multihash as a string (potential improvement)

Action item:

  1. determine whether we operate on peerIds more as strings (a lot of toString()) or the multihash digest version of a peerId
    1. deploy two bootstrappers & compare CPU and mem usage
      • one where peerId stores the multihash as a multihash digest object (current)
      • one where it stores the multihash as a string (potential improvement)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/ready Ready to be worked
Projects
None yet
Development

No branches or pull requests

8 participants