-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: continuously publish provider records #175
Conversation
75a6ec3
to
9be8abb
Compare
p2p/src/node.rs
Outdated
@@ -250,6 +264,17 @@ where | |||
} | |||
} | |||
} | |||
provide_records = self.publisher.next() => { | |||
if let Some(kad) = self.swarm.behaviour_mut().kad.as_mut() { | |||
if let Some(provide_records) = provide_records { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let Some(provide_records) = provide_records { | |
if let Some(provide_records) = provide_records { |
For some reason, my eyes are very good at catching extra/missing whitespaces 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I would expect cargo fmt
to catch this. I'll take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cargo fmt can't always format code inside of a macro and this code block lives inside a select!
macro invocation. This is my I generally like to put as little logic inside the macro as possible and just call out to functions in the bodies of the branches.
Fixed
one/src/lib.rs
Outdated
@@ -78,7 +78,7 @@ struct DaemonOpts { | |||
#[arg( | |||
short, | |||
long, | |||
default_value = "127.0.0.1:9090", | |||
default_value = "127.0.0.1:9464", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this number vs any other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9464 is the standard port for hosting metrics, 9090 is the standard port for hosting prometheus itself. I got them mixed up initially. This changes to the more standard port.
.set_parallelism(config.kademlia_parallelism) | ||
.set_query_timeout(config.kademlia_query_timeout) | ||
.set_provider_record_ttl(config.kademlia_provider_record_ttl) | ||
// Disable record (re)-replication and (re)-publication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should reference that we are doing this somewhere else. This looks like we are not publishing provider records.
@@ -1321,12 +1345,14 @@ mod tests { | |||
// Using an in memory DB for the tests for realistic benchmark disk DB is needed. | |||
let sql_pool = SqlitePool::connect("sqlite::memory:").await?; | |||
|
|||
let metrics = Metrics::register(&mut prometheus_client::registry::Registry::default()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this recording metrics in tests?
Is this because we want the metrics from the test or just to make the tests more like release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because I didn't abstract over optional metrics in the code. The types require a Metrics instance in order to be constructed. So we do not validate metrics in tests yet, but this is how we could.
9be8abb
to
4c8fb70
Compare
This adds a mechanism to continuously publish provider records. The behavior is that publishes are spread out over the entire publication interval with retries for any failed publishes. The effect is that each block in the store will have its hash published as a provider record at least once per interval.
Implementation Details
Some things of note for how this was implemented.
The Publisher type implements a Stream trait that continuously emits batches of Keys that need to be published. The node then tells kademlia to publish them and informs the Publisher of the results. When a batch has completed i.e. the publisher knows the pass/fail result of each key, it starts a new batch.
Between each batch the publisher may sleep/delay if its ahead of its projected deadline (the end of the interval). Structured logs report whether the publisher is ahead or behind.
Metrics have been added to monitor the success/fail rate of publish events.
The kademlia protocol itself has been configured to not do any of its own republishing and to store all provider records in memory. This way it can answer queries from other peers. This means that the data is duplicated on disk and in memory which is not ideal but seems to be the best we can do for now.
Performance
On my local system I was able to publish records at a very high rate, 500K in a few hours. This seems to be because we iterate the records in order of hash and so we talk to adjacent sets of peers for large numbers of records. And then we slowly move through the space. The publisher has some settings we can tweak to adjust for production workloads as needed.