Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: peerstore persistence #619

Merged
merged 4 commits into from
May 5, 2020
Merged

Conversation

vasco-santos
Copy link
Member

@vasco-santos vasco-santos commented Apr 27, 2020

This PR is part of the PeerStore improvements EPIC and tackles the milestone 3 - PeerStore Persistence.

As a follow up of the work in #590, #598 and #610, this PR adds a datastore to backup the data stored in the AddressBook and ProtoBoo..

Taking into account that most of our books (or all) will leverage this feature, it was added to the Book base class. However, since different books store data structured in different way and might also intend to store them in the datastore differently (or a subset of it), the Book allows the implementer class to transform the data through function parameters.

Datastore

As stated in #591, the keys used to store this data should have common namespaces to allow it to loaded without prior knowledge on node startup. In addition, this was based on the go-libp2p naming as speaking one language is better than two. The namespaces naming is as follows

AddressBook

All the knownw peer addresses are stored with a key pattern as follows:

/peers/addrs/<b32 peer id no padding>

ProtoBook

All the knownw peer protocols are stored with a key pattern as follows:

/peers/protos/<b32 peer id no padding>

Other books

The other books naming is described in #591, but it is important pointing out that keybook will need a key suffix for specifying the type of key. This was already taken into account while implementing this.

Doubts / pending decisions

Enabled by default

While the goal of persisting the PeerStore is for users to use it, if we default to use the Memory datastore the data will not be available on a browser page refresh, as an example. While storing data in the Memory datastore does not provide us real value, it is the only available datastore which works for every runtime. Accordingly, if users need to specify the datastore they want to use, it might be a good option to also require them to enable the peerStore persistence. As a consequence, there will be no misunderstandings on users thinking that the PeerStore data is being stored, but since it is only stored in Memory, it will not be available in the future.

If we go with not enabling it by default, we should do it once we work on improving the configuration. Since we intend to work on pre built configurations for the main runtimes, we would leverage the proper datastores in them.

Store protocols

As far as I Know, go-libp2p is persisting the AddressBook, KeyBook and MetadataBook, but not the ProtoBook. Firstly, we will run the identify protocol after establishing a connection with other peers and gather their updated protocols. So, as far as we want to connect to the stored peers blindly we do not need to store the protocols as they will be collected. However, there might be scenarios where this data could be relevant (even with no guarantess of being updated). For instance, if we already have a large number of peers stored once a libp2p node starts, it can potentially select a subset of the peers to connect according to the protocols it is running and the known protocols potentially have running.

Considering this, I am leaning towards allowing this to be configured on user land. Each book should be able to have persistence enabled in the node configuration.

EDIT: in the libp2p sync call I got to know that go-libp2p is storing the protocols as well, but under the metadata namespace, with a protocols property: https://github.com/libp2p/go-libp2p-peerstore/blob/master/pstoreds/protobook.go#L55 where meta is the metadata book: https://github.com/libp2p/go-libp2p-peerstore/blob/master/pstoreds/metadata.go

Consider storing multiaddrs metadata

Since we do not have multiaddrs metadata yet we will not store it for now. However, if we intend to store it in the future we should already consider it in terms of the protobuf definition. Potential validity and multiaddr confidence might be relevant in the present, but I am not sure on wether they will have value in the future.

Looking into Browser storage limits references for IndexedDB, I think that it will not be a problem to store this metadata in terms of storage capacity. However, this data might also be modified frequently, which will result in lots of operations with the DB.

Needs:

As a follow up, we should create an issue to benchmark the changes in the PeerStore datastore that are not relevant, since we are "updating" all data from a dirty peer, while only the data of one book might be modified.

@vasco-santos vasco-santos force-pushed the feat/peerstore-persistence branch 2 times, most recently from 63808eb to 5ddbf9f Compare April 27, 2020 08:38
src/peer-store/book.js Outdated Show resolved Hide resolved
@vasco-santos vasco-santos force-pushed the feat/peerstore-persistence branch 6 times, most recently from 098c177 to 3f3d084 Compare April 29, 2020 15:23
@vasco-santos vasco-santos marked this pull request as ready for review April 29, 2020 15:25
@vasco-santos
Copy link
Member Author

@jacobheun you can have an initial look at this. Please read the main message of the PR, since there are some open questions that might influence this implementation.

I am currently assuming this enabled by default, no configuration for type of book to persist, persist protocols as their own thing and not considering multiaddrs metadata. I will need to change what we decide from this, but from my initial thoughts, I would go with:

  • Disabled by default
  • Enabled Persistence configurable by book
  • Persist protocols as its own thing (like it currently is)
  • Do not consider multiaddrs metadata (like it currently is)

@jacobheun
Copy link
Contributor

jacobheun commented Apr 30, 2020

If we go with not enabling it by default, we should do it once we work on improving the configuration. Since we intend to work on pre built configurations for the main runtimes, we would leverage the proper datastores in them.

I think it makes sense to not enable it by default, as it's a waste of processing time since we can't actually persist anything through restarts/refreshes with the memory store.

Considering this, I am leaning towards allowing this to be configured on user land. Each book should be able to have persistence enabled in the node configuration.

As mentioned, go does store this, and I think it makes sense for us to store it by default (assuming persistence is enabled). As we improve Topology usage with the Connection Manager this should help with bootstrapping the peer on restarts, assuming the peers are stable.

Looking into Browser storage limits references for IndexedDB, I think that it will not be a problem to store this metadata in terms of storage capacity. However, this data might also be modified frequently, which will result in lots of operations with the DB.

We can evaluate storing the multiaddr metadata when that work starts. Maintaining TTLs and Confidence can be valuable when restarts are within the TTL expiration time.

re: lots of operations with the DB, we can look at performing flushes to avoid every metadata change causing a write.

@vasco-santos
Copy link
Member Author

vasco-santos commented Apr 30, 2020

We can evaluate storing the multiaddr metadata when that work starts. Maintaining TTLs and Confidence can be valuable when restarts are within the TTL expiration time.

My main concern about this will be the breaking change in the datastore. f you think maintaining ttls and confidence can be valuable, I feel that it might be a good idea to just take this into account in the protobuf definition:

const message = `
message Address {
  required bytes addr = 1;
}

message Addresses {
  repeated Address addrs = 1;
}`

vs

const message = `
message Addresses {
  repeated bytes addrs = 1;
}`

@jacobheun
Copy link
Contributor

Yes, I think we'll want to keep track of Address records and not just multiaddrs, so I would account for that here.

@vasco-santos vasco-santos force-pushed the feat/peerstore-persistence branch 4 times, most recently from 92edef4 to 1f06a3b Compare May 1, 2020 15:43
@vasco-santos vasco-santos requested review from jacobheun and removed request for jacobheun May 1, 2020 16:23
@vasco-santos
Copy link
Member Author

This should now be good to a review @jacobheun

.aegir.js Show resolved Hide resolved
@vasco-santos vasco-santos mentioned this pull request May 4, 2020
1 task
doc/API.md Outdated Show resolved Hide resolved
doc/CONFIGURATION.md Outdated Show resolved Hide resolved
doc/CONFIGURATION.md Show resolved Hide resolved
src/index.js Outdated Show resolved Hide resolved
src/peer-store/README.md Outdated Show resolved Hide resolved
src/peer-store/persistent/index.js Outdated Show resolved Hide resolved
src/peer-store/persistent/index.js Show resolved Hide resolved
_processDatastoreEntry ({ key, value }) {
try {
const keyParts = key.toString().split('/')
const peerId = PeerId.createFromCID(keyParts[3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this creates a problem because we compute the data for peerStore.peers, but unmarshalling from the datastore doesn't set the peer id in the peerstore itself. So on restart we'd have no ids in the peerstore, but data in each of the books which sounds like a recipe for some edge cases in the future.

Copy link
Member Author

@vasco-santos vasco-santos May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this does not create an issue now, but the keyBook will be populated with the PeerId. So, I think we will be good there :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly what I mean is that PeerStore.peerIds is an unreliable Map. It might not contain all the peers we know about. PeerStore.peers is computed, so it will be reliable.

test/peer-store/address-book.spec.js Show resolved Hide resolved
test/peer-store/persisted-peer-store.spec.js Show resolved Hide resolved
Co-authored-by: Jacob Heun <jacobheun@gmail.com>
Copy link
Contributor

@jacobheun jacobheun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM

@vasco-santos vasco-santos merged commit 1059c87 into 0.28.x May 5, 2020
@vasco-santos vasco-santos deleted the feat/peerstore-persistence branch May 5, 2020 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants