Fair queuing for block processor #4476

pwojcikdev · 2024-03-10T19:14:44Z

The current design of block processor queue is problematic. We have a single, large queue that is shared by all peers and node components (bootstrapping, local blocks, unchecked table). This means that in periods of high activity this queue can quickly become overwhelmed, with excessively active peers consuming its capacity.

I propose we use a new design: a fair queue where each peer and each component gets its own smaller queue with a configurable size and priority. Block processor will then use this queue to process incoming blocks in a (weighted) round robin manner. This should ensure that even when network is under stress, data coming from well-behaved peers is ingested quickly.

Currently all peers are treated with equal priority, but it is possible to modify this implementation to give more resources to representatives - once we implement a reliable way to authenticate channels.

The default size of queue per peer is 128 blocks. This should be enough for live network, but probably should be adjusted for beta in order to be able to saturate the network.

nano/node/fair_queue.hpp

SparK-Cruz · 2024-03-11T18:32:38Z

Bad, very bad.
Never trust a peer, even with signed connections, even with 65% weight and 100% uptime for the last century. (Let's not get started on MITM attacks)
"Y can't spam, but X and Z can spam if they want but they won't because they are trust worthy" is a very bad take.
Much prefer something like #3166

gr0vity-dev · 2024-03-11T18:45:43Z

"Y can't spam, but X and Z can spam if they want but they won't because they are trust worthy" is a very bad take.

That's not what's happening here. Everybody gets a fair chance to submit their blocks to the network.
If on the one hand we have a spammer with 1000 blocks and on the other hand 10 peers with 1 block each, all the peers will have their first block processed before the spammer can offload the remaining 999 blocks. No one is excluded.

Much prefer something like #3166

A better solution to this has already been implemented here #4454 afaik. Instead of limiting legit services to wait for the previous confirmation, all the PRs in the network wait for the previous confirmation before publishing the block. So we have the best of both worlds. Reduced traffic and no artificial limits for legit services that want to publish all their blocks at once.

pwojcikdev · 2024-03-11T18:56:56Z

As gr0vity-dev explained, the point of this PR is not to somehow filter only "trusted" peers, but to ensure every peer is treated equally and ensure QoS for legit transactions, even under constant network stress. If there are 100 peers and 99 of them are spamming, that 1 well-behaved peer will have its transaction preempt the rest of the traffic.

SparK-Cruz · 2024-03-11T18:59:09Z

The "configurable size and priority" part got me reading this all wrong.

dsiganos · 2024-03-11T18:56:49Z

nano/node/fair_queue.hpp

+
+namespace nano
+{
+template <typename Request, typename Source>


A brief comment here to explain what fair_queue does would be nice.

To help the issue, here is my understanding, use it as you see fit:

A fair queue holds objects of type Requests that have arrived from an origin described by type origin_type.
A fair queue behaves like a single queue at a high level but it is composed of a number of sub-queues.
It creates a sub-queue for each origin type and each sub-queue has a priority and a max_size.
The purpose of the fair queue is to ensure that blocks are taken out of the fair queue in a round robin fashion from each sub-queue.
So for example, 10 blocks from each origin max, before moving to the next origin.
Origin objects have the concept of liveness, a their sub-queue is deleted when the origin is no longer alive.

pwojcikdev · 2024-03-11T19:06:45Z

I see, my description could be a bit misleading. Only local blocks (submitted via RPC) and bootstrapping is treated with more priority and gets allocated larger queues.

nano/node/blockprocessor.hpp

nano/node/blockprocessor.cpp

nano/node/fair_queue.hpp

dsiganos · 2024-03-11T20:22:38Z

nano/node/fair_queue.hpp

+	};
+
+private:
+	struct entry


The name entry is a bit too generic for something so important. How about origin_queue?

nano/node/fair_queue.hpp

dsiganos · 2024-03-11T20:54:45Z

Is this a possible problem? If a node (for example one of @gr0vity-dev test scripts) creates a TCP connection, pushes some blocks and then immediately closes the connection. If the cleanup function is called fast enough, will the blocks be dropped? That will not be a major problem in the real network (because connection are not typically short lived) but it might cause problems with test scripts.

clemahieu · 2024-03-11T21:30:53Z

nano/core_test/fair_queue.cpp

+
+TEST (fair_queue, construction)
+{
+	nano::fair_queue<source_enum, int> queue;


These types are reversed compared to other tests.

dsiganos · 2024-03-11T23:17:07Z

nano/node/blockprocessor.cpp

-		forced.pop_front ();
-		return entry;
-	}
+	debug_assert (!queue.empty ()); // This should be checked before calling next


The behaviour of forced queue is changed here.
Previously force always had priority whereas now it is one of the round robin queues.
Due to the round robin nature of the implementation, maybe this is fine, but it is a difference in behaviour that I thought I should flag up.

dsiganos · 2024-03-11T23:52:19Z

nano/node/blockprocessor.cpp

+			case nano::block_source::local:
+				return config.priority_local;
+			default:
+				return 1;


This will give block_source::forced a priority of 1.
I would expect forced to have a higher priority.

Okay, I can make it higher priority, but can you explain exactly why it should be higher priority?

Force insertion is used by elections to process forked block. I must be missing something, but seems to me that giving higher priority to forked elections/blocks is unwanted behavior. Why should we prioritize forks?

dsiganos · 2024-03-11T23:59:58Z

nano/node/fair_queue.hpp

+		using queue_t = std::deque<Request>;
+		queue_t requests;
+
+		size_t priority;


Priority seems like a misleading term because blocks with highest priority do not necessarily go first.
Throughput factor or queue width or something indicating heavier flow might be a better name.
I do not feel very strongly about this, feel free to ignore if you disagree.

I'm happy to change it to a better name, though for aesthetic reasons it must be a single word.

allowance or quota?

pwojcikdev · 2024-03-12T08:01:55Z

@dsiganos Is this this behavior a problem or a feature? What if a peer repeatedly reconnects and pushes blocks?

# Conflicts: # nano/lib/stats_enums.hpp # nano/node/CMakeLists.txt

clemahieu

Minor comments but I think it's at a good point to merge and iterate.

clemahieu · 2024-04-04T09:30:50Z

nano/test_common/testutil.hpp

@@ -3,6 +3,7 @@
 #include <nano/lib/locks.hpp>
 #include <nano/lib/timer.hpp>
 #include <nano/node/transport/channel.hpp>
+#include <nano/node/transport/fake.hpp>


This can be forward declared

clemahieu · 2024-04-04T09:39:24Z

nano/node/blockprocessor.hpp

@@ -38,6 +38,26 @@ enum class block_source
 std::string_view to_string (block_source);
 nano::stat::detail to_stat_detail (block_source);

+class block_processor_config final


This could be its own file. Including node_config.hpp pulls in blockprocessor.hpp -> fair_queue.hpp -> transport/channel.hpp

I used this pattern of declaring config next to the class in quite a few places at this point. I find it very convenient when working, to the point where having to always split it and keep in separate file would probably outweigh any compilation speed benefits, at least on my hardware.

But this can be fixed at the nodeconfig level much easier, by just using the same pattern as in node header itself: forward declare and store unique_ptrs to configs with an additional reference for convenience.

pwojcikdev requested review from clemahieu and dsiganos March 10, 2024 19:26

pwojcikdev force-pushed the fair-queuing-3 branch from 18ccc17 to ec8fa21 Compare March 10, 2024 21:28

pwojcikdev added 2 commits March 10, 2024 22:31

Return proper channel type from nano::test::fake_channel

e119358

Introduce fair_queue class

619258d

pwojcikdev force-pushed the fair-queuing-3 branch from ec8fa21 to 3dc15fd Compare March 10, 2024 21:32

Use per peer fair queue in block processor

6602c7c

pwojcikdev force-pushed the fair-queuing-3 branch from 3dc15fd to 6602c7c Compare March 10, 2024 21:45

qwahzi added this to the V27 milestone Mar 10, 2024

qwahzi mentioned this pull request Mar 10, 2024

Tracking potential bottlenecks & performance improvements #4262

Open

clemahieu reviewed Mar 11, 2024

View reviewed changes

nano/node/fair_queue.hpp Outdated Show resolved Hide resolved

clemahieu and others added 2 commits March 11, 2024 10:16

Removing unused headers.

bf9d18c

Simplify source tuple

9782234