Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: run publisher in its own task #188

Merged
merged 1 commit into from
Nov 16, 2023

Conversation

nathanielc
Copy link
Collaborator

This change makes the Publisher logic run in its own task. This is to relieve CPU pressure from the swarm task.

Copy link

linear bot commented Nov 15, 2023

WS1-1339 DHT flaky tests

We believe the DHT publish logic is starving the swarm task. We should run it on its own task.

DOD:

  • End-to-End tests consistently pass with DHT changes

@nathanielc
Copy link
Collaborator Author

Locally this works well, however I'd like to test in dev/QA with better load to determine how well it works. We can revert if its not sufficient.

let mut stream = PublisherStream::new(results_rx, interval, block_store, metrics.clone());
let task_metrics = metrics.clone();
tokio::spawn(async move {
while let Some(batch) = stream.next().await {
Copy link
Collaborator

@smrz2001 smrz2001 Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So:

  • The Publisher drives PublisherStream in a separate task.
  • PublisherStream builds and returns batches into a channel that gets polled by the swarm task.
  • The swarm publishes a batch and returns results via another channel that gets consumed first before a new batch is returned.

Does this sound like an accurate description?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's the SQL lookups/batching logic that's been moved out from the swarm task to its own task, via PublisherStream.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PublisherStream builds and returns batches into a channel that gets polled by the swarm task.

This part is technically incorrect. Both the Publisher and PublisherStream types implement the Stream trait. The swarm is unware that PublisherStream exists or that the channel exists. So the swarm task polls the Publisher for the next batch. The only work the Publisher does is to read a channel if something is available.

The swarm publishes a batch and returns results via another channel that gets consumed first before a new batch is returned.

Yes except the swarm is unware of the results channel as well. It simply calls the handle method on the Publisher which puts the result on a channel that the PublisherStream reads.

So generally correct except that the Publisher abstraction means that the swarm doesn't care that channels and a separate task are being used under the hood.

Perhaps I should rename PublisherStream to PublisherWorker? That way its not misleading that they both implement Stream?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha, makes sense, ty! I got confused by the two layers of streams.

Yeah, I think calling it PublisherWorker would help.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps even PublisherStream and PublisherWorkerStream but not a strong preference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the names to Publisher and PublisherWorker. Keep it simple. I have added comments about the fact they implement the Stream trait and why.

Copy link
Collaborator

@smrz2001 smrz2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@nathanielc nathanielc added this pull request to the merge queue Nov 16, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 16, 2023
This change makes the Publisher logic run in its own task. This is to
relieve CPU pressure from the swarm task.
@nathanielc nathanielc force-pushed the feature/ws1-1339-dht-flaky-tests branch from 12f93cb to 92fce7b Compare November 16, 2023 16:53
@nathanielc nathanielc added this pull request to the merge queue Nov 16, 2023
Merged via the queue into main with commit 1304660 Nov 16, 2023
4 checks passed
@nathanielc nathanielc deleted the feature/ws1-1339-dht-flaky-tests branch November 16, 2023 17:22
@smrz2001 smrz2001 mentioned this pull request Jan 23, 2024
@github-actions github-actions bot mentioned this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants