feat(congestion_control) - handling missing chunks #11274

wacban · 2024-05-09T13:00:48Z

When there are multiple missing chunks in a row in a shard we want to consider that shard as congested. That is in order to prevent outgoing receipts to that shard accumulating and then blowing up the state witness size.

I decided to not embed the information about missing chunks in the congestion info but rather I added a new struct called ExtendedCongestionInfo. The Block now constructs congestion info, extends it with the information about missing chunks and provides this new struct to the runtime.

Since from now the congestion level cannot be calculated without the missing chunks information I added missing_chunks_count argument to all methods that rely on the congestion level. That is to make sure the users of those structs do not forget about the missing chunks - compiler will warn them about it. In the runtime the ExtendedCongestionInfo struct acts as a helper to make it as convenient as it used to be.

The congestion level itself is now a maximum of 4 values - the first three as before and a new one for missing chunks. In this PR I made it so that 10 missed chunks in a row would lead to full congestion - that number is to be adjusted based on data. Other changes can also be considered such as adding the missing chunks congestion to the max of the others. I'm open for suggestions here.

It's in draft because I still need to add tests for this.

chain/chain/src/chain_update.rs

core/primitives/src/congestion_info.rs

jakmeier

Looks great! Fitted in really smoothly, almost as if we designed it with this addition in mind from the beginning. :)

Adding tests would make sense, yes. Please re-request a review if you add it in this PR, so I can take another look before merging.

jakmeier · 2024-05-09T16:26:58Z

core/primitives/src/block.rs

+            let congestion_info = chunk.congestion_info().unwrap_or_default();
+            let height_included = chunk.height_included();
+            let height_current = self.header().height();
+            let missed_chunks_count = height_current - height_included;


nit: consider using checked_sub with something like .expect("current cannot be older than included") to make it more obvious that the code technically has a panic path here but that we have semantic guarantees that make it unreachable.

core/primitives/src/congestion_info.rs

jakmeier · 2024-05-09T16:46:41Z

core/primitives/src/congestion_info.rs

+        // TODO(congestion_control) Set missed chunks count correctly.
+        if self.congestion_level(0) < 1.0 {


Hm, how would this even work correctly?

I mean, here we set our own congestion info for other shards to read later. So in this code line, we cannot know how many chunks will be missed between now and then, right?
So I guess we would have to override before using it. But then it would have to be specified in the protocol, rather than giving clients the choice. 🤔

Or, based on today's discussion, maybe we should simply never activate an allowed shard based on missing chunks, right? Then 0 works here, just needs a comment to explain the thinking.

The only possible concern is that deadlocks come back if we deactivate round-robin.

But in my mind, that seems fine. The condition for deadlocks would be that one chunk makes no more progress at all. And what should we do in this case? Stopping traffic seems better than blowing up all shards with evergrowing numbers of receipts to buffer.

wacban · 2024-05-10T09:09:51Z

cc @Longarithm this is the missing chunks handling if you are interested.

wacban · 2024-05-13T10:40:06Z

Adding tests would make sense, yes. Please re-request a review if you add it in this PR, so I can take another look before merging.

@jakmeier I added some simple tests, have a look.

jakmeier

tests look good, too :)

…ctions

codecov · 2024-05-14T14:10:46Z

Codecov Report

Attention: Patch coverage is 95.93023% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 71.02%. Comparing base (0a61a29) to head (e679a39).
Report is 8 commits behind head on master.

Files	Patch %	Lines
core/primitives/src/congestion_info.rs	95.10%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #11274      +/-   ##
==========================================
+ Coverage   70.99%   71.02%   +0.03%     
==========================================
  Files         781      781              
  Lines      155505   155627     +122     
  Branches   155505   155627     +122     
==========================================
+ Hits       110407   110541     +134     
+ Misses      40323    40317       -6     
+ Partials     4775     4769       -6

Flag	Coverage Δ
backward-compatibility	`0.24% <0.00%> (-0.01%)`	⬇️
db-migration	`0.24% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.40% <0.00%> (-0.01%)`	⬇️
integration-tests	`37.16% <46.51%> (-0.03%)`	⬇️
linux	`69.08% <90.69%> (+0.05%)`	⬆️
linux-nightly	`70.49% <95.93%> (+0.02%)`	⬆️
macos	`52.53% <89.88%> (+0.11%)`	⬆️
pytests	`1.62% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.42% <0.00%> (-0.01%)`	⬇️
unittests	`65.46% <95.90%> (+0.03%)`	⬆️
upgradability	`0.29% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wacban requested a review from jakmeier May 9, 2024 13:01

wacban commented May 9, 2024

View reviewed changes

chain/chain/src/chain_update.rs Show resolved Hide resolved

core/primitives/src/congestion_info.rs Show resolved Hide resolved

Base automatically changed from waclaw-adv-2 to master May 9, 2024 13:30

wacban force-pushed the waclaw-cc branch from 61e4aa7 to b8bc03d Compare May 9, 2024 13:36

jakmeier approved these changes May 9, 2024

View reviewed changes

wacban marked this pull request as ready for review May 10, 2024 09:09

wacban requested a review from a team as a code owner May 10, 2024 09:09

wacban requested a review from tayfunelmas May 10, 2024 09:09

wacban added 4 commits May 13, 2024 10:18

feat(congestion_control) - handling missing chunks

d70324c

tests and nits

9a54528

floats are hard

e808531

rebase

632eba8

wacban force-pushed the waclaw-cc branch from d21cb91 to 632eba8 Compare May 13, 2024 10:33

wacban requested a review from jakmeier May 13, 2024 10:40

clippy

1521a12

jakmeier approved these changes May 13, 2024

View reviewed changes

fix a test where congestion control kicked in and stopped some transa…

fa365ff

…ctions

wacban enabled auto-merge May 13, 2024 13:19

jancionear mentioned this pull request May 13, 2024

Limit size of source_receipt_proofs inside ChunkStateWitness #11295

Open

ignore breaking test, added todo

e679a39

wacban added this pull request to the merge queue May 14, 2024

Merged via the queue into master with commit e3c8f1f May 14, 2024
28 of 29 checks passed

wacban deleted the waclaw-cc branch May 14, 2024 14:23

jakmeier mentioned this pull request May 24, 2024

[ProjectTracking]: congestion control near/near-one-project-tracking#48

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(congestion_control) - handling missing chunks #11274

feat(congestion_control) - handling missing chunks #11274

wacban commented May 9, 2024 •

edited

Loading

jakmeier left a comment

jakmeier May 9, 2024

jakmeier May 9, 2024

jakmeier May 9, 2024

wacban commented May 10, 2024

wacban commented May 13, 2024

jakmeier left a comment

codecov bot commented May 14, 2024

		// TODO(congestion_control) Set missed chunks count correctly.
		if self.congestion_level(0) < 1.0 {

feat(congestion_control) - handling missing chunks #11274

feat(congestion_control) - handling missing chunks #11274

Conversation

wacban commented May 9, 2024 • edited Loading

jakmeier left a comment

Choose a reason for hiding this comment

jakmeier May 9, 2024

Choose a reason for hiding this comment

jakmeier May 9, 2024

Choose a reason for hiding this comment

jakmeier May 9, 2024

Choose a reason for hiding this comment

wacban commented May 10, 2024

wacban commented May 13, 2024

jakmeier left a comment

Choose a reason for hiding this comment

codecov bot commented May 14, 2024

Codecov Report

wacban commented May 9, 2024 •

edited

Loading