-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: peerstore persistence #619
Conversation
63808eb
to
5ddbf9f
Compare
098c177
to
3f3d084
Compare
@jacobheun you can have an initial look at this. Please read the main message of the PR, since there are some open questions that might influence this implementation. I am currently assuming this enabled by default, no configuration for type of book to persist, persist protocols as their own thing and not considering multiaddrs metadata. I will need to change what we decide from this, but from my initial thoughts, I would go with:
|
3f3d084
to
d93c1be
Compare
I think it makes sense to not enable it by default, as it's a waste of processing time since we can't actually persist anything through restarts/refreshes with the memory store.
As mentioned, go does store this, and I think it makes sense for us to store it by default (assuming persistence is enabled). As we improve Topology usage with the Connection Manager this should help with bootstrapping the peer on restarts, assuming the peers are stable.
We can evaluate storing the multiaddr metadata when that work starts. Maintaining TTLs and Confidence can be valuable when restarts are within the TTL expiration time. re: lots of operations with the DB, we can look at performing flushes to avoid every metadata change causing a write. |
My main concern about this will be the breaking change in the datastore. f you think maintaining ttls and confidence can be valuable, I feel that it might be a good idea to just take this into account in the protobuf definition: const message = `
message Address {
required bytes addr = 1;
}
message Addresses {
repeated Address addrs = 1;
}` vs const message = `
message Addresses {
repeated bytes addrs = 1;
}` |
Yes, I think we'll want to keep track of Address records and not just multiaddrs, so I would account for that here. |
92edef4
to
1f06a3b
Compare
This should now be good to a review @jacobheun |
1f06a3b
to
898e983
Compare
_processDatastoreEntry ({ key, value }) { | ||
try { | ||
const keyParts = key.toString().split('/') | ||
const peerId = PeerId.createFromCID(keyParts[3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this creates a problem because we compute the data for peerStore.peers
, but unmarshalling from the datastore doesn't set the peer id in the peerstore itself. So on restart we'd have no ids in the peerstore, but data in each of the books which sounds like a recipe for some edge cases in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this does not create an issue now, but the keyBook will be populated with the PeerId. So, I think we will be good there :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly what I mean is that PeerStore.peerIds
is an unreliable Map. It might not contain all the peers we know about. PeerStore.peers
is computed, so it will be reliable.
Co-authored-by: Jacob Heun <jacobheun@gmail.com>
38c661a
to
519eef0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM
This PR is part of the PeerStore improvements EPIC and tackles the milestone 3 - PeerStore Persistence.
As a follow up of the work in #590, #598 and #610, this PR adds a datastore to backup the data stored in the AddressBook and ProtoBoo..
Taking into account that most of our books (or all) will leverage this feature, it was added to the Book base class. However, since different books store data structured in different way and might also intend to store them in the datastore differently (or a subset of it), the Book allows the implementer class to transform the data through function parameters.
Datastore
As stated in #591, the keys used to store this data should have common namespaces to allow it to loaded without prior knowledge on node startup. In addition, this was based on the
go-libp2p
naming as speaking one language is better than two. The namespaces naming is as followsAddressBook
All the knownw peer addresses are stored with a key pattern as follows:
/peers/addrs/<b32 peer id no padding>
ProtoBook
All the knownw peer protocols are stored with a key pattern as follows:
/peers/protos/<b32 peer id no padding>
Other books
The other books naming is described in #591, but it is important pointing out that keybook will need a key suffix for specifying the type of key. This was already taken into account while implementing this.
Doubts / pending decisions
Enabled by default
While the goal of persisting the PeerStore is for users to use it, if we default to use the Memory datastore the data will not be available on a browser page refresh, as an example. While storing data in the Memory datastore does not provide us real value, it is the only available datastore which works for every runtime. Accordingly, if users need to specify the datastore they want to use, it might be a good option to also require them to enable the peerStore persistence. As a consequence, there will be no misunderstandings on users thinking that the PeerStore data is being stored, but since it is only stored in Memory, it will not be available in the future.
If we go with not enabling it by default, we should do it once we work on improving the configuration. Since we intend to work on pre built configurations for the main runtimes, we would leverage the proper datastores in them.
Store protocols
As far as I Know,
go-libp2p
is persisting the AddressBook, KeyBook and MetadataBook, but not the ProtoBook. Firstly, we will run the identify protocol after establishing a connection with other peers and gather their updated protocols. So, as far as we want to connect to the stored peers blindly we do not need to store the protocols as they will be collected. However, there might be scenarios where this data could be relevant (even with no guarantess of being updated). For instance, if we already have a large number of peers stored once a libp2p node starts, it can potentially select a subset of the peers to connect according to the protocols it is running and the known protocols potentially have running.Considering this, I am leaning towards allowing this to be configured on user land. Each book should be able to have persistence enabled in the node configuration.
EDIT: in the libp2p sync call I got to know that
go-libp2p
is storing the protocols as well, but under themetadata
namespace, with aprotocols
property: https://github.com/libp2p/go-libp2p-peerstore/blob/master/pstoreds/protobook.go#L55 wheremeta
is the metadata book: https://github.com/libp2p/go-libp2p-peerstore/blob/master/pstoreds/metadata.goConsider storing multiaddrs metadata
Since we do not have multiaddrs metadata yet we will not store it for now. However, if we intend to store it in the future we should already consider it in terms of the protobuf definition. Potential validity and multiaddr confidence might be relevant in the present, but I am not sure on wether they will have value in the future.
Looking into Browser storage limits references for IndexedDB, I think that it will not be a problem to store this metadata in terms of storage capacity. However, this data might also be modified frequently, which will result in lots of operations with the DB.
Needs:
As a follow up, we should create an issue to benchmark the changes in the PeerStore datastore that are not relevant, since we are "updating" all data from a dirty peer, while only the data of one book might be modified.