-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fair queuing for block processor #4476
Conversation
18ccc17
to
ec8fa21
Compare
ec8fa21
to
3dc15fd
Compare
3dc15fd
to
6602c7c
Compare
Bad, very bad. |
That's not what's happening here. Everybody gets a fair chance to submit their blocks to the network.
A better solution to this has already been implemented here #4454 afaik. Instead of limiting legit services to wait for the previous confirmation, all the PRs in the network wait for the previous confirmation before publishing the block. So we have the best of both worlds. Reduced traffic and no artificial limits for legit services that want to publish all their blocks at once. |
As gr0vity-dev explained, the point of this PR is not to somehow filter only "trusted" peers, but to ensure every peer is treated equally and ensure QoS for legit transactions, even under constant network stress. If there are 100 peers and 99 of them are spamming, that 1 well-behaved peer will have its transaction preempt the rest of the traffic. |
The "configurable size and priority" part got me reading this all wrong. |
|
||
namespace nano | ||
{ | ||
template <typename Request, typename Source> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A brief comment here to explain what fair_queue does would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help the issue, here is my understanding, use it as you see fit:
A fair queue holds objects of type Requests that have arrived from an origin described by type origin_type.
A fair queue behaves like a single queue at a high level but it is composed of a number of sub-queues.
It creates a sub-queue for each origin type and each sub-queue has a priority and a max_size.
The purpose of the fair queue is to ensure that blocks are taken out of the fair queue in a round robin fashion from each sub-queue.
So for example, 10 blocks from each origin max, before moving to the next origin.
Origin objects have the concept of liveness, a their sub-queue is deleted when the origin is no longer alive.
I see, my description could be a bit misleading. Only local blocks (submitted via RPC) and bootstrapping is treated with more priority and gets allocated larger queues. |
}; | ||
|
||
private: | ||
struct entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name entry is a bit too generic for something so important. How about origin_queue?
Is this a possible problem? If a node (for example one of @gr0vity-dev test scripts) creates a TCP connection, pushes some blocks and then immediately closes the connection. If the cleanup function is called fast enough, will the blocks be dropped? That will not be a major problem in the real network (because connection are not typically short lived) but it might cause problems with test scripts. |
|
||
TEST (fair_queue, construction) | ||
{ | ||
nano::fair_queue<source_enum, int> queue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These types are reversed compared to other tests.
forced.pop_front (); | ||
return entry; | ||
} | ||
debug_assert (!queue.empty ()); // This should be checked before calling next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behaviour of forced queue is changed here.
Previously force always had priority whereas now it is one of the round robin queues.
Due to the round robin nature of the implementation, maybe this is fine, but it is a difference in behaviour that I thought I should flag up.
case nano::block_source::local: | ||
return config.priority_local; | ||
default: | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will give block_source::forced a priority of 1.
I would expect forced to have a higher priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I can make it higher priority, but can you explain exactly why it should be higher priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force insertion is used by elections to process forked block. I must be missing something, but seems to me that giving higher priority to forked elections/blocks is unwanted behavior. Why should we prioritize forks?
using queue_t = std::deque<Request>; | ||
queue_t requests; | ||
|
||
size_t priority; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Priority seems like a misleading term because blocks with highest priority do not necessarily go first.
Throughput factor or queue width or something indicating heavier flow might be a better name.
I do not feel very strongly about this, feel free to ignore if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to change it to a better name, though for aesthetic reasons it must be a single word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowance
or quota
?
@dsiganos Is this this behavior a problem or a feature? What if a peer repeatedly reconnects and pushes blocks? |
# Conflicts: # nano/lib/stats_enums.hpp # nano/node/CMakeLists.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments but I think it's at a good point to merge and iterate.
@@ -3,6 +3,7 @@ | |||
#include <nano/lib/locks.hpp> | |||
#include <nano/lib/timer.hpp> | |||
#include <nano/node/transport/channel.hpp> | |||
#include <nano/node/transport/fake.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be forward declared
@@ -38,6 +38,26 @@ enum class block_source | |||
std::string_view to_string (block_source); | |||
nano::stat::detail to_stat_detail (block_source); | |||
|
|||
class block_processor_config final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be its own file. Including node_config.hpp pulls in blockprocessor.hpp -> fair_queue.hpp -> transport/channel.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this pattern of declaring config next to the class in quite a few places at this point. I find it very convenient when working, to the point where having to always split it and keep in separate file would probably outweigh any compilation speed benefits, at least on my hardware.
But this can be fixed at the nodeconfig level much easier, by just using the same pattern as in node header itself: forward declare and store unique_ptrs to configs with an additional reference for convenience.
The current design of block processor queue is problematic. We have a single, large queue that is shared by all peers and node components (bootstrapping, local blocks, unchecked table). This means that in periods of high activity this queue can quickly become overwhelmed, with excessively active peers consuming its capacity.
I propose we use a new design: a fair queue where each peer and each component gets its own smaller queue with a configurable size and priority. Block processor will then use this queue to process incoming blocks in a (weighted) round robin manner. This should ensure that even when network is under stress, data coming from well-behaved peers is ingested quickly.
Currently all peers are treated with equal priority, but it is possible to modify this implementation to give more resources to representatives - once we implement a reliable way to authenticate channels.
The default size of queue per peer is 128 blocks. This should be enough for live network, but probably should be adjusted for beta in order to be able to saturate the network.