-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(congestion_control) - handling missing chunks #11274
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Fitted in really smoothly, almost as if we designed it with this addition in mind from the beginning. :)
Adding tests would make sense, yes. Please re-request a review if you add it in this PR, so I can take another look before merging.
core/primitives/src/block.rs
Outdated
let congestion_info = chunk.congestion_info().unwrap_or_default(); | ||
let height_included = chunk.height_included(); | ||
let height_current = self.header().height(); | ||
let missed_chunks_count = height_current - height_included; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider using checked_sub
with something like .expect("current cannot be older than included")
to make it more obvious that the code technically has a panic path here but that we have semantic guarantees that make it unreachable.
// TODO(congestion_control) Set missed chunks count correctly. | ||
if self.congestion_level(0) < 1.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, how would this even work correctly?
I mean, here we set our own congestion info for other shards to read later. So in this code line, we cannot know how many chunks will be missed between now and then, right?
So I guess we would have to override before using it. But then it would have to be specified in the protocol, rather than giving clients the choice. 🤔
Or, based on today's discussion, maybe we should simply never activate an allowed shard based on missing chunks, right? Then 0 works here, just needs a comment to explain the thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only possible concern is that deadlocks come back if we deactivate round-robin.
But in my mind, that seems fine. The condition for deadlocks would be that one chunk makes no more progress at all. And what should we do in this case? Stopping traffic seems better than blowing up all shards with evergrowing numbers of receipts to buffer.
cc @Longarithm this is the missing chunks handling if you are interested. |
@jakmeier I added some simple tests, have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests look good, too :)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11274 +/- ##
==========================================
+ Coverage 70.99% 71.02% +0.03%
==========================================
Files 781 781
Lines 155505 155627 +122
Branches 155505 155627 +122
==========================================
+ Hits 110407 110541 +134
+ Misses 40323 40317 -6
+ Partials 4775 4769 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
When there are multiple missing chunks in a row in a shard we want to consider that shard as congested. That is in order to prevent outgoing receipts to that shard accumulating and then blowing up the state witness size.
I decided to not embed the information about missing chunks in the congestion info but rather I added a new struct called ExtendedCongestionInfo. The Block now constructs congestion info, extends it with the information about missing chunks and provides this new struct to the runtime.
Since from now the congestion level cannot be calculated without the missing chunks information I added
missing_chunks_count
argument to all methods that rely on the congestion level. That is to make sure the users of those structs do not forget about the missing chunks - compiler will warn them about it. In the runtime the ExtendedCongestionInfo struct acts as a helper to make it as convenient as it used to be.The congestion level itself is now a maximum of 4 values - the first three as before and a new one for missing chunks. In this PR I made it so that 10 missed chunks in a row would lead to full congestion - that number is to be adjusted based on data. Other changes can also be considered such as adding the missing chunks congestion to the max of the others. I'm open for suggestions here.
It's in draft because I still need to add tests for this.