Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: continuously publish provider records #175

Merged
merged 1 commit into from
Nov 14, 2023
Merged

Conversation

nathanielc
Copy link
Collaborator

@nathanielc nathanielc commented Nov 8, 2023

This adds a mechanism to continuously publish provider records. The behavior is that publishes are spread out over the entire publication interval with retries for any failed publishes. The effect is that each block in the store will have its hash published as a provider record at least once per interval.

Implementation Details

Some things of note for how this was implemented.

The Publisher type implements a Stream trait that continuously emits batches of Keys that need to be published. The node then tells kademlia to publish them and informs the Publisher of the results. When a batch has completed i.e. the publisher knows the pass/fail result of each key, it starts a new batch.

Between each batch the publisher may sleep/delay if its ahead of its projected deadline (the end of the interval). Structured logs report whether the publisher is ahead or behind.

Metrics have been added to monitor the success/fail rate of publish events.

The kademlia protocol itself has been configured to not do any of its own republishing and to store all provider records in memory. This way it can answer queries from other peers. This means that the data is duplicated on disk and in memory which is not ideal but seems to be the best we can do for now.

Performance

On my local system I was able to publish records at a very high rate, 500K in a few hours. This seems to be because we iterate the records in order of hash and so we talk to adjacent sets of peers for large numbers of records. And then we slowly move through the space. The publisher has some settings we can tweak to adjust for production workloads as needed.

@nathanielc nathanielc changed the title feat: continously publish provider records feat: continuously publish provider records Nov 8, 2023
@nathanielc nathanielc marked this pull request as ready for review November 8, 2023 18:02
p2p/src/node.rs Outdated
@@ -250,6 +264,17 @@ where
}
}
}
provide_records = self.publisher.next() => {
if let Some(kad) = self.swarm.behaviour_mut().kad.as_mut() {
if let Some(provide_records) = provide_records {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(provide_records) = provide_records {
if let Some(provide_records) = provide_records {

For some reason, my eyes are very good at catching extra/missing whitespaces 😂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I would expect cargo fmt to catch this. I'll take a look.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo fmt can't always format code inside of a macro and this code block lives inside a select! macro invocation. This is my I generally like to put as little logic inside the macro as possible and just call out to functions in the bodies of the branches.

Fixed

one/src/lib.rs Outdated
@@ -78,7 +78,7 @@ struct DaemonOpts {
#[arg(
short,
long,
default_value = "127.0.0.1:9090",
default_value = "127.0.0.1:9464",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this number vs any other?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9464 is the standard port for hosting metrics, 9090 is the standard port for hosting prometheus itself. I got them mixed up initially. This changes to the more standard port.

.set_parallelism(config.kademlia_parallelism)
.set_query_timeout(config.kademlia_query_timeout)
.set_provider_record_ttl(config.kademlia_provider_record_ttl)
// Disable record (re)-replication and (re)-publication
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should reference that we are doing this somewhere else. This looks like we are not publishing provider records.

@@ -1321,12 +1345,14 @@ mod tests {
// Using an in memory DB for the tests for realistic benchmark disk DB is needed.
let sql_pool = SqlitePool::connect("sqlite::memory:").await?;

let metrics = Metrics::register(&mut prometheus_client::registry::Registry::default());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this recording metrics in tests?
Is this because we want the metrics from the test or just to make the tests more like release?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because I didn't abstract over optional metrics in the code. The types require a Metrics instance in order to be constructed. So we do not validate metrics in tests yet, but this is how we could.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants