Limit size of source_receipt_proofs inside ChunkStateWitness #11295

jancionear · 2024-05-13T16:48:59Z

There could be a lot of large incoming receipts, which could cause the source_receipt_proofs field inside ChunkStateWitness to be really large. There are known problems with distributing large instances of ChunkStateWitness, so we should analyze how large the incoming receipts can be, and add a size limit to make sure that their size stays reasonable.

Refs: #10259, #11103, #11274

/cc @wacban @Longarithm, I know you already spent some time thinking about incoming receipts.

I think we should at least look into it before stateless validation launch, to estimate how bad things can get.

The text was updated successfully, but these errors were encountered:

jancionear · 2024-05-14T16:09:58Z

One solution would be to allow shards to send out large receipts on every nth block, round robin style.

Something like:

if block_height % num_shards == my_shard_id {
    // allowed to send out 4MB of receipts
} else {
    // allowed to send out 50kB of receipts, the rest of outgoing receipts should wait in a queue
}

This would ensure that incoming receipts from one block height are at most 4MB + (num_shards - 1) * 50kB in size.

There's still a problem when we have missing chunks and the incoming receipts are from multiple block heights, but this could be solved by adding a condition that a chunk is allowed to send out large receipts only when the previous chunk wasn't missing.

wacban · 2024-05-15T07:22:41Z

@jancionear I think it's best to reason about the aggregated sum of the receipts rather than just the size of an individual receipts. At the end of the day it's the total size of all receipts that contributes to the state witness.

With that in mind I would suggest deploying a new metric to a canary node showing the typical total size of outgoing receipts per receiver shard. Assuming that on average this metric remains somewhat reasonable we can then consider what you suggested - limiting the number of bytes a shard is allowed to send to another shard in one block with round robin extra allowance.

jancionear · 2024-05-20T14:48:15Z

Another thing to consider is smaller receipts being blocked by large ones. If we were to just process receipts in FIFO order, waiting until an outgoing receipt can be sent out, we could end up in a situation where the first receipt in the queue is really large, and we'll wait for a few blocks until we're able to send out this receipt, without sending anything else for those few blocks. A single big receipt blocks all small receipts.

To fix this we could have two separate queues of outgoing receipts - a queue for big ones that can only be sent in the special blocks, and another for small ones that can be sent from any block.

wacban · 2024-05-23T10:15:40Z

My current take for it would be as follows:

Let's pick a target maximum size of source_receipt_proofs and call it MAX.
Let's set a reasonable limit on the number of bytes that one shard can send to another. I think this limit can be as high as MAX or some high percentage thereof (e.g. 75%). This way we still allow full throughput in the case where two shards just exchange a lot of bytes. It can be done by leveraging congestion control. However while this may work in the average case and in the case where one shards spams another it wouldn't cover the worst case where all shards spam a single target shard.
In the worst case scenario the state witness would still exceed the size limit and may not be distributed in time to the chunk validators. We can rely on a combination of other solutions to handle that situation:
- Thanks to fix: a slow chunk should not lead to chain getting stuck #11344 eventually a new chunk will be produced in the affected shard and we can move past the troublesome state witness.
- Thanks to feat(congestion_control) - handling missing chunks #11274 the situation will not be getting worse due to missing chunks
- Thanks to congestion control backpressure will be applied and transactions will be rejected until the chain recovers.

jancionear · 2024-05-23T11:24:43Z

Thanks to #11344 eventually a new chunk will be produced in the affected shard and we can move past the troublesome state witness.

Does #11344 help with large chunk state witnesses? My understanding was that it helps when it takes a long time to apply a chunk, but there could still be a situation when a chunk isn't endorsed in time because the validator didn't receive a witness for this chunk in time.

wacban · 2024-05-23T11:55:28Z

Does #11344 help with large chunk state witnesses? My understanding was that it helps when it takes a long time to apply a chunk, but there could still be a situation when a chunk isn't endorsed in time because the validator didn't receive a witness for this chunk in time.

I would hope so but definitely needs testing and checking. Perhaps you can introduce a way to manually increase the size of the state witness (good for testing in forknet) or slow down the state witness distribution (good for testing in nayduck) and see if the chain can recover? I think both are worth doing. You can follow the example of slow_chunk.py where neard is compiled with the test_features feature and you can add adversarial behaviour, custom host functions or whatever else you need for simulating difficult behaviour.

jancionear · 2024-05-24T16:54:50Z

During the last meeting Bowen said that #11344 might not help with large witnesses, we might need a separate fix to deal with witnesses that take a long time to distribute.

jancionear · 2024-05-24T16:56:38Z

Even if we can recover from extra large witnesses, I feel that we still need to properly limit witness size. A malicious actor could try to cause the worst case scenario on every block, which would make the chain struggle.

jancionear · 2024-06-06T15:07:49Z

The approach from #11295 (comment), where one shard is allowed to send more than others isn't the best, because the receipts have to wait for up to num_shards block before being sent.

Maybe it would be possible to make a better solution by making the shards negotiate how much they're allowed to send out on each block height. At each height every shard would publish an intent of sending out X MB of receipts to some shard. Then on the next height every shard sees how much every other shard wants to send out, and can deterministically decide how much they're allowed to send. In scenarios of low congestion this would allow large receipts to be sent in ~2 blocks, instead of num_shards blocks. With high congestion the deterministic algorithm would determine who is allowed to send out a large receipt at each height, in a fair way.

That sounds like a better solution, although it's more complex, so for now we can go with the simple round robin.

…ize of source_receipt_proofs under control (#11492) This is a basic fix for: #11295 The problem is that the size of `source_receipt_proofs` could be really large in some scenarios. If all 6 shards send a 4MB outgoing receipt to shard 0, then `source_receipt_proofs` for shard 0 will be of size 6 * 4MB = 24MB. That's way too big, the network probably won't be able to distribute that in time. And as we add more shards to the network, the effect will get worse and worse. This fix deals with the problem by allowing only one chosen shard to send large receipts to the other shard. All other shards are only allowed to send ~100kB of receipts. So instead of 6 shards sending 4MB, we end up with 5 shards sending 100kB and one shard sending 4MB, which adds up to 4.5MB, much more manageable. The mapping of "who is able to send a lot of outgoing receipts to whom" changes with every block height: ![image](https://github.com/near/nearcore/assets/149345204/3b571d7c-da24-4cd9-ad8f-19686fc0f055) In this example at block height 2: * shard 0 can send: * 100kB of receipts to shard 0 * 100kB of receipts to shard 1 * 4.5MB of receipts to shard 2 * shard 1 can send: * 4.5MB of receipts to shard 0 * 100kB of receipts to shard 1 * 100kB of receipts to shard 2 * shard 2 can send: * 100kB of receipts to shard 0 * 4.5MB of receipts to shard 1 * 100kB of receipts to shard 2 At every height a receiving shard will receive large receipts from only one shard, so the size of `source_receipt_proofs` stays under control. The mapping changes, so every large receipt will eventually be sent out when the mapping allows to send it to the destination. The new limits are: * `outgoing_receipts_usual_size_limit`: 102_400 (100kiB) * `outgoing_receipts_big_size_limit`: 4_718_592 (4.5MiB - a bit larger than the 4MiB receipt limit to make sure that 4MiB receipts can get through) ### Flaws This is a basic solution which has some flaws. It limits the witness size, but it affects throughput in certain scenarios. It can serve as a starting point for further improvements, something that we can get in before the code freeze. * Flaw 1: big receipts can block small receipts Shard tries to send outgoing receipts in the order in which they were created. When a large receipt is at the front of the queue, the shard won't send anything until it can send out this large receipt. This means that the shard might not send out anything for a few blocks. This could be fixed by having a separate queue for large outgoing receipts. * Flaw 2: missing chunks When a chunk is missing, the next chunk receives receipts from two block heights. This means that it could receive two 4MB receipts. This could be fixed by disallowing sending large receipts to shard that just had missing chunks ### TODO The implementation is pretty much ready, I should probably write some tests, but for now I have to do other stuff. Posting the PR as is for now.

jancionear · 2024-06-14T13:01:30Z

In #11492 I implemented a basic receipt size limit using the approach where one chosen shard is allowed to send out more at each block height, while other shards wait for their turn.

It limits the receipts, but there are some flaws.
We had a meeting discussing it, here's a short summary:

The first problem is that small receipts can be blocked by big receipts. If a small receipt is enqueued after a big one, then the small one will have to wait until the big one is sent out, which happens only once every num_shards.

Another problem is that the change might not play well with congestion control. Congestion control reacts aggresively to buffered receipts. Once there is 500 TGas worth of enqueued receipts, the shard is considered fully congested and cannot receive any more receipts. Having 500 TGas of enqueued receipts could become fairly common because of small receipts getting stuck behind the big ones.

One way to solve the problem of small receipts getting stuck would be to have a separate queue for large receipts. But this would be problematic for the transaction priorities NEP. With transaction priorities we would like to send receipts with higher priority before the ones with low priority. What should happen when a big receipt has higher priority, but we're unable to send it at this block height? Should we send a smaller receipt with lower priority? Or wait for the big receipt? Two queues cause a headache there.

There was also the question of whether we need this kind of limit at all. If we reduced the receipt size limit o 1.5 MiB, then with six shards we could receive at most 6*1.5 MiB = 9 MiB of incoming receipts, which sounds somewhat managable.
I believe that having a limit on incoming receipts to a single shard is consistent with the sharded design of NEAR. A single shard can only process so much, and can only receive so much. As we scale horizontally and keep adding more shards, the issue will become more pronounced. With 100 shards we could have 150 MiB of incoming receipts, which isn't going to work. We need to incorporate a limit on incoming data into our design sooner or later.

There is also the concern of DoS attacks. What happens when someone submits a lot of large receipts aimed at a single shard? Cross shard throughput is limited, so it will cause a lot of congestion. I think that gas-based congestion control has to deal with the same problem, so maybe there can be some universal solution to discourage sending a lot of data to a single shard - higher gas prices or something like that.

jancionear · 2024-06-14T13:02:33Z

Because of the trouble with transaction priorities I became less enthusiastic about the two-queue solution. I think it would help in the short term, increasing the throughput and lowering latency in presence of large receipts, but it's incompatible with transaction priorities.

But I have another idea that should work better: partial receipts
A lot of headache is caused by the fact that we can only send whole receipts, which can be really large. What if we added a way to split a large receipt into many small partial receipts? The sender shard could send a limited amount of partial receipts at every block height instead of sending one large receipt. This way we wouldn't have a dilemma whether to send a small receipt with low priority before the large receipt. Every receipt (or at least a part of receipt) is sendable at every height, so we can have a single queue/buffer of outgoing partial receipts and send some of them out to the receiver shard at every block height. The receiver has to reconstruct the large receipt from the partial ones, but that sounds doable.

jancionear added the A-stateless-validation Area: stateless validation label May 13, 2024

jancionear self-assigned this May 14, 2024

jancionear mentioned this issue May 23, 2024

[stateless_validation] Limit witness size growth on large incoming receipts #11383

Closed

wacban mentioned this issue May 23, 2024

[stateless_validation] Limit witness size growth on missing chunks #11039

Closed

jancionear mentioned this issue May 23, 2024

Lower transaction size limit to 1.5MiB #11379

Closed

jancionear mentioned this issue Jun 6, 2024

feat(stateless_validation): Limit size of outgoing receipts to keep size of source_receipt_proofs under control #11492

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit size of source_receipt_proofs inside ChunkStateWitness #11295

Limit size of source_receipt_proofs inside ChunkStateWitness #11295

jancionear commented May 13, 2024

jancionear commented May 14, 2024

wacban commented May 15, 2024

jancionear commented May 20, 2024 •

edited

Loading

wacban commented May 23, 2024

jancionear commented May 23, 2024 •

edited

Loading

wacban commented May 23, 2024

jancionear commented May 24, 2024 •

edited

Loading

jancionear commented May 24, 2024

jancionear commented Jun 6, 2024 •

edited

Loading

jancionear commented Jun 14, 2024

jancionear commented Jun 14, 2024

Limit size of source_receipt_proofs inside ChunkStateWitness #11295

Limit size of source_receipt_proofs inside ChunkStateWitness #11295

Comments

jancionear commented May 13, 2024

jancionear commented May 14, 2024

wacban commented May 15, 2024

jancionear commented May 20, 2024 • edited Loading

wacban commented May 23, 2024

jancionear commented May 23, 2024 • edited Loading

wacban commented May 23, 2024

jancionear commented May 24, 2024 • edited Loading

jancionear commented May 24, 2024

jancionear commented Jun 6, 2024 • edited Loading

jancionear commented Jun 14, 2024

jancionear commented Jun 14, 2024

jancionear commented May 20, 2024 •

edited

Loading

jancionear commented May 23, 2024 •

edited

Loading

jancionear commented May 24, 2024 •

edited

Loading

jancionear commented Jun 6, 2024 •

edited

Loading