feat: peerstore persistence #619

vasco-santos · 2020-04-27T08:34:13Z

This PR is part of the PeerStore improvements EPIC and tackles the milestone 3 - PeerStore Persistence.

As a follow up of the work in #590, #598 and #610, this PR adds a datastore to backup the data stored in the AddressBook and ProtoBoo..

Taking into account that most of our books (or all) will leverage this feature, it was added to the Book base class. However, since different books store data structured in different way and might also intend to store them in the datastore differently (or a subset of it), the Book allows the implementer class to transform the data through function parameters.

Datastore

As stated in #591, the keys used to store this data should have common namespaces to allow it to loaded without prior knowledge on node startup. In addition, this was based on the go-libp2p naming as speaking one language is better than two. The namespaces naming is as follows

AddressBook

All the knownw peer addresses are stored with a key pattern as follows:

/peers/addrs/<b32 peer id no padding>

ProtoBook

All the knownw peer protocols are stored with a key pattern as follows:

/peers/protos/<b32 peer id no padding>

Other books

The other books naming is described in #591, but it is important pointing out that keybook will need a key suffix for specifying the type of key. This was already taken into account while implementing this.

Doubts / pending decisions

Enabled by default

While the goal of persisting the PeerStore is for users to use it, if we default to use the Memory datastore the data will not be available on a browser page refresh, as an example. While storing data in the Memory datastore does not provide us real value, it is the only available datastore which works for every runtime. Accordingly, if users need to specify the datastore they want to use, it might be a good option to also require them to enable the peerStore persistence. As a consequence, there will be no misunderstandings on users thinking that the PeerStore data is being stored, but since it is only stored in Memory, it will not be available in the future.

If we go with not enabling it by default, we should do it once we work on improving the configuration. Since we intend to work on pre built configurations for the main runtimes, we would leverage the proper datastores in them.

Store protocols

As far as I Know, go-libp2p is persisting the AddressBook, KeyBook and MetadataBook, but not the ProtoBook. Firstly, we will run the identify protocol after establishing a connection with other peers and gather their updated protocols. So, as far as we want to connect to the stored peers blindly we do not need to store the protocols as they will be collected. However, there might be scenarios where this data could be relevant (even with no guarantess of being updated). For instance, if we already have a large number of peers stored once a libp2p node starts, it can potentially select a subset of the peers to connect according to the protocols it is running and the known protocols potentially have running.

Considering this, I am leaning towards allowing this to be configured on user land. Each book should be able to have persistence enabled in the node configuration.

EDIT: in the libp2p sync call I got to know that go-libp2p is storing the protocols as well, but under the metadata namespace, with a protocols property: https://github.com/libp2p/go-libp2p-peerstore/blob/master/pstoreds/protobook.go#L55 where meta is the metadata book: https://github.com/libp2p/go-libp2p-peerstore/blob/master/pstoreds/metadata.go

Consider storing multiaddrs metadata

Since we do not have multiaddrs metadata yet we will not store it for now. However, if we intend to store it in the future we should already consider it in terms of the protobuf definition. Potential validity and multiaddr confidence might be relevant in the present, but I am not sure on wether they will have value in the future.

Looking into Browser storage limits references for IndexedDB, I think that it will not be a problem to store this metadata in terms of storage capacity. However, this data might also be modified frequently, which will result in lots of operations with the DB.

Needs:

As a follow up, we should create an issue to benchmark the changes in the PeerStore datastore that are not relevant, since we are "updating" all data from a dirty peer, while only the data of one book might be modified.

src/peer-store/book.js

vasco-santos · 2020-04-29T15:30:08Z

@jacobheun you can have an initial look at this. Please read the main message of the PR, since there are some open questions that might influence this implementation.

I am currently assuming this enabled by default, no configuration for type of book to persist, persist protocols as their own thing and not considering multiaddrs metadata. I will need to change what we decide from this, but from my initial thoughts, I would go with:

Disabled by default
Enabled Persistence configurable by book
Persist protocols as its own thing (like it currently is)
Do not consider multiaddrs metadata (like it currently is)

jacobheun · 2020-04-30T14:28:16Z

If we go with not enabling it by default, we should do it once we work on improving the configuration. Since we intend to work on pre built configurations for the main runtimes, we would leverage the proper datastores in them.

I think it makes sense to not enable it by default, as it's a waste of processing time since we can't actually persist anything through restarts/refreshes with the memory store.

Considering this, I am leaning towards allowing this to be configured on user land. Each book should be able to have persistence enabled in the node configuration.

As mentioned, go does store this, and I think it makes sense for us to store it by default (assuming persistence is enabled). As we improve Topology usage with the Connection Manager this should help with bootstrapping the peer on restarts, assuming the peers are stable.

Looking into Browser storage limits references for IndexedDB, I think that it will not be a problem to store this metadata in terms of storage capacity. However, this data might also be modified frequently, which will result in lots of operations with the DB.

We can evaluate storing the multiaddr metadata when that work starts. Maintaining TTLs and Confidence can be valuable when restarts are within the TTL expiration time.

re: lots of operations with the DB, we can look at performing flushes to avoid every metadata change causing a write.

vasco-santos · 2020-04-30T14:37:05Z

We can evaluate storing the multiaddr metadata when that work starts. Maintaining TTLs and Confidence can be valuable when restarts are within the TTL expiration time.

My main concern about this will be the breaking change in the datastore. f you think maintaining ttls and confidence can be valuable, I feel that it might be a good idea to just take this into account in the protobuf definition:

const message = `
message Address {
  required bytes addr = 1;
}

message Addresses {
  repeated Address addrs = 1;
}`

vs

const message = `
message Addresses {
  repeated bytes addrs = 1;
}`

jacobheun · 2020-04-30T14:43:36Z

Yes, I think we'll want to keep track of Address records and not just multiaddrs, so I would account for that here.

vasco-santos · 2020-05-01T16:23:54Z

This should now be good to a review @jacobheun

.aegir.js

doc/API.md

doc/CONFIGURATION.md

src/index.js

src/peer-store/README.md

src/peer-store/persistent/index.js

jacobheun · 2020-05-05T12:33:32Z

src/peer-store/persistent/index.js

+  _processDatastoreEntry ({ key, value }) {
+    try {
+      const keyParts = key.toString().split('/')
+      const peerId = PeerId.createFromCID(keyParts[3])


I don't think this creates a problem because we compute the data for peerStore.peers, but unmarshalling from the datastore doesn't set the peer id in the peerstore itself. So on restart we'd have no ids in the peerstore, but data in each of the books which sounds like a recipe for some edge cases in the future.

Yeah, this does not create an issue now, but the keyBook will be populated with the PeerId. So, I think we will be good there :)

Mostly what I mean is that PeerStore.peerIds is an unreliable Map. It might not contain all the peers we know about. PeerStore.peers is computed, so it will be reliable.

test/peer-store/address-book.spec.js

test/peer-store/persisted-peer-store.spec.js

Co-authored-by: Jacob Heun <jacobheun@gmail.com>

jacobheun

This LGTM

vasco-santos force-pushed the feat/peerstore-persistence branch 2 times, most recently from 63808eb to 5ddbf9f Compare April 27, 2020 08:38

vasco-santos mentioned this pull request Apr 27, 2020

PeerStore improvements #582

Closed

vasco-santos commented Apr 27, 2020

View reviewed changes

src/peer-store/book.js Outdated Show resolved Hide resolved

vasco-santos force-pushed the feat/peerstore-persistence branch 6 times, most recently from 098c177 to 3f3d084 Compare April 29, 2020 15:23

vasco-santos marked this pull request as ready for review April 29, 2020 15:25

vasco-santos requested a review from jacobheun April 29, 2020 15:26

vasco-santos force-pushed the feat/peerstore-persistence branch from 3f3d084 to d93c1be Compare April 30, 2020 10:09

vasco-santos force-pushed the feat/peerstore-persistence branch 4 times, most recently from 92edef4 to 1f06a3b Compare May 1, 2020 15:43

vasco-santos requested review from jacobheun and removed request for jacobheun May 1, 2020 16:23

vasco-santos force-pushed the 0.28.x branch from e63a2eb to 0252eaf Compare May 1, 2020 19:50

vasco-santos added 3 commits May 1, 2020 21:51

feat: peerStore persistence

224ab16

chore: persist delete with test

9132ae5

refactor: peristent peer-store extended class and disabled by defaul

898e983

vasco-santos force-pushed the feat/peerstore-persistence branch from 1f06a3b to 898e983 Compare May 1, 2020 19:55

vasco-santos commented May 1, 2020

View reviewed changes

.aegir.js Show resolved Hide resolved

vasco-santos mentioned this pull request May 4, 2020

feat: keybook #626

Merged

1 task

jacobheun reviewed May 5, 2020

View reviewed changes

chore: apply suggestions from code review

519eef0

Co-authored-by: Jacob Heun <jacobheun@gmail.com>

vasco-santos force-pushed the feat/peerstore-persistence branch from 38c661a to 519eef0 Compare May 5, 2020 13:15

vasco-santos requested a review from jacobheun May 5, 2020 13:37

jacobheun approved these changes May 5, 2020

View reviewed changes

vasco-santos merged commit 1059c87 into 0.28.x May 5, 2020

vasco-santos deleted the feat/peerstore-persistence branch May 5, 2020 14:08

vasco-santos mentioned this pull request Jun 2, 2020

PeerStore Persistence #591

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: peerstore persistence #619

feat: peerstore persistence #619

vasco-santos commented Apr 27, 2020 •

edited

Loading

vasco-santos commented Apr 29, 2020

jacobheun commented Apr 30, 2020 •

edited

Loading

vasco-santos commented Apr 30, 2020 •

edited

Loading

jacobheun commented Apr 30, 2020

vasco-santos commented May 1, 2020

jacobheun May 5, 2020

vasco-santos May 5, 2020 •

edited

Loading

jacobheun May 5, 2020

jacobheun left a comment

feat: peerstore persistence #619

feat: peerstore persistence #619

Conversation

vasco-santos commented Apr 27, 2020 • edited Loading

Datastore

AddressBook

ProtoBook

Other books

Doubts / pending decisions

Enabled by default

Store protocols

Consider storing multiaddrs metadata

vasco-santos commented Apr 29, 2020

jacobheun commented Apr 30, 2020 • edited Loading

vasco-santos commented Apr 30, 2020 • edited Loading

jacobheun commented Apr 30, 2020

vasco-santos commented May 1, 2020

jacobheun May 5, 2020

Choose a reason for hiding this comment

vasco-santos May 5, 2020 • edited Loading

Choose a reason for hiding this comment

jacobheun May 5, 2020

Choose a reason for hiding this comment

jacobheun left a comment

Choose a reason for hiding this comment

vasco-santos commented Apr 27, 2020 •

edited

Loading

jacobheun commented Apr 30, 2020 •

edited

Loading

vasco-santos commented Apr 30, 2020 •

edited

Loading

vasco-santos May 5, 2020 •

edited

Loading