Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(congestion_control) - handling missing chunks #11274

Merged
merged 7 commits into from
May 14, 2024
Merged

feat(congestion_control) - handling missing chunks #11274

merged 7 commits into from
May 14, 2024

Conversation

wacban
Copy link
Contributor

@wacban wacban commented May 9, 2024

When there are multiple missing chunks in a row in a shard we want to consider that shard as congested. That is in order to prevent outgoing receipts to that shard accumulating and then blowing up the state witness size.

I decided to not embed the information about missing chunks in the congestion info but rather I added a new struct called ExtendedCongestionInfo. The Block now constructs congestion info, extends it with the information about missing chunks and provides this new struct to the runtime.

Since from now the congestion level cannot be calculated without the missing chunks information I added missing_chunks_count argument to all methods that rely on the congestion level. That is to make sure the users of those structs do not forget about the missing chunks - compiler will warn them about it. In the runtime the ExtendedCongestionInfo struct acts as a helper to make it as convenient as it used to be.

The congestion level itself is now a maximum of 4 values - the first three as before and a new one for missing chunks. In this PR I made it so that 10 missed chunks in a row would lead to full congestion - that number is to be adjusted based on data. Other changes can also be considered such as adding the missing chunks congestion to the max of the others. I'm open for suggestions here.

It's in draft because I still need to add tests for this.

@wacban wacban requested a review from jakmeier May 9, 2024 13:01
Base automatically changed from waclaw-adv-2 to master May 9, 2024 13:30
Copy link
Contributor

@jakmeier jakmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Fitted in really smoothly, almost as if we designed it with this addition in mind from the beginning. :)

Adding tests would make sense, yes. Please re-request a review if you add it in this PR, so I can take another look before merging.

let congestion_info = chunk.congestion_info().unwrap_or_default();
let height_included = chunk.height_included();
let height_current = self.header().height();
let missed_chunks_count = height_current - height_included;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider using checked_sub with something like .expect("current cannot be older than included") to make it more obvious that the code technically has a panic path here but that we have semantic guarantees that make it unreachable.

core/primitives/src/congestion_info.rs Show resolved Hide resolved
core/primitives/src/congestion_info.rs Show resolved Hide resolved
Comment on lines 422 to 423
// TODO(congestion_control) Set missed chunks count correctly.
if self.congestion_level(0) < 1.0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, how would this even work correctly?

I mean, here we set our own congestion info for other shards to read later. So in this code line, we cannot know how many chunks will be missed between now and then, right?
So I guess we would have to override before using it. But then it would have to be specified in the protocol, rather than giving clients the choice. 🤔

Or, based on today's discussion, maybe we should simply never activate an allowed shard based on missing chunks, right? Then 0 works here, just needs a comment to explain the thinking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only possible concern is that deadlocks come back if we deactivate round-robin.

But in my mind, that seems fine. The condition for deadlocks would be that one chunk makes no more progress at all. And what should we do in this case? Stopping traffic seems better than blowing up all shards with evergrowing numbers of receipts to buffer.

@wacban wacban marked this pull request as ready for review May 10, 2024 09:09
@wacban wacban requested a review from a team as a code owner May 10, 2024 09:09
@wacban wacban requested a review from tayfunelmas May 10, 2024 09:09
@wacban
Copy link
Contributor Author

wacban commented May 10, 2024

cc @Longarithm this is the missing chunks handling if you are interested.

@wacban
Copy link
Contributor Author

wacban commented May 13, 2024

Adding tests would make sense, yes. Please re-request a review if you add it in this PR, so I can take another look before merging.

@jakmeier I added some simple tests, have a look.

@wacban wacban requested a review from jakmeier May 13, 2024 10:40
Copy link
Contributor

@jakmeier jakmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests look good, too :)

@wacban wacban added this pull request to the merge queue May 14, 2024
Copy link

codecov bot commented May 14, 2024

Codecov Report

Attention: Patch coverage is 95.93023% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 71.02%. Comparing base (0a61a29) to head (e679a39).
Report is 8 commits behind head on master.

Files Patch % Lines
core/primitives/src/congestion_info.rs 95.10% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11274      +/-   ##
==========================================
+ Coverage   70.99%   71.02%   +0.03%     
==========================================
  Files         781      781              
  Lines      155505   155627     +122     
  Branches   155505   155627     +122     
==========================================
+ Hits       110407   110541     +134     
+ Misses      40323    40317       -6     
+ Partials     4775     4769       -6     
Flag Coverage Δ
backward-compatibility 0.24% <0.00%> (-0.01%) ⬇️
db-migration 0.24% <0.00%> (-0.01%) ⬇️
genesis-check 1.40% <0.00%> (-0.01%) ⬇️
integration-tests 37.16% <46.51%> (-0.03%) ⬇️
linux 69.08% <90.69%> (+0.05%) ⬆️
linux-nightly 70.49% <95.93%> (+0.02%) ⬆️
macos 52.53% <89.88%> (+0.11%) ⬆️
pytests 1.62% <0.00%> (-0.01%) ⬇️
sanity-checks 1.42% <0.00%> (-0.01%) ⬇️
unittests 65.46% <95.90%> (+0.03%) ⬆️
upgradability 0.29% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Merged via the queue into master with commit e3c8f1f May 14, 2024
28 of 29 checks passed
@wacban wacban deleted the waclaw-cc branch May 14, 2024 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants