-
Notifications
You must be signed in to change notification settings - Fork 650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit size of source_receipt_proofs inside ChunkStateWitness #11295
Comments
One solution would be to allow shards to send out large receipts on every nth block, round robin style. Something like: if block_height % num_shards == my_shard_id {
// allowed to send out 4MB of receipts
} else {
// allowed to send out 50kB of receipts, the rest of outgoing receipts should wait in a queue
} This would ensure that incoming receipts from one block height are at most 4MB + (num_shards - 1) * 50kB in size. There's still a problem when we have missing chunks and the incoming receipts are from multiple block heights, but this could be solved by adding a condition that a chunk is allowed to send out large receipts only when the previous chunk wasn't missing. |
@jancionear I think it's best to reason about the aggregated sum of the receipts rather than just the size of an individual receipts. At the end of the day it's the total size of all receipts that contributes to the state witness. With that in mind I would suggest deploying a new metric to a canary node showing the typical total size of outgoing receipts per receiver shard. Assuming that on average this metric remains somewhat reasonable we can then consider what you suggested - limiting the number of bytes a shard is allowed to send to another shard in one block with round robin extra allowance. |
Another thing to consider is smaller receipts being blocked by large ones. If we were to just process receipts in FIFO order, waiting until an outgoing receipt can be sent out, we could end up in a situation where the first receipt in the queue is really large, and we'll wait for a few blocks until we're able to send out this receipt, without sending anything else for those few blocks. A single big receipt blocks all small receipts. To fix this we could have two separate queues of outgoing receipts - a queue for big ones that can only be sent in the special blocks, and another for small ones that can be sent from any block. |
My current take for it would be as follows:
|
Does #11344 help with large chunk state witnesses? My understanding was that it helps when it takes a long time to apply a chunk, but there could still be a situation when a chunk isn't endorsed in time because the validator didn't receive a witness for this chunk in time. |
I would hope so but definitely needs testing and checking. Perhaps you can introduce a way to manually increase the size of the state witness (good for testing in forknet) or slow down the state witness distribution (good for testing in nayduck) and see if the chain can recover? I think both are worth doing. You can follow the example of slow_chunk.py where neard is compiled with the test_features feature and you can add adversarial behaviour, custom host functions or whatever else you need for simulating difficult behaviour. |
During the last meeting Bowen said that #11344 might not help with large witnesses, we might need a separate fix to deal with witnesses that take a long time to distribute. |
Even if we can recover from extra large witnesses, I feel that we still need to properly limit witness size. A malicious actor could try to cause the worst case scenario on every block, which would make the chain struggle. |
The approach from #11295 (comment), where one shard is allowed to send more than others isn't the best, because the receipts have to wait for up to Maybe it would be possible to make a better solution by making the shards negotiate how much they're allowed to send out on each block height. At each height every shard would publish an intent of sending out X MB of receipts to some shard. Then on the next height every shard sees how much every other shard wants to send out, and can deterministically decide how much they're allowed to send. In scenarios of low congestion this would allow large receipts to be sent in ~2 blocks, instead of That sounds like a better solution, although it's more complex, so for now we can go with the simple round robin. |
…ize of source_receipt_proofs under control (#11492) This is a basic fix for: #11295 The problem is that the size of `source_receipt_proofs` could be really large in some scenarios. If all 6 shards send a 4MB outgoing receipt to shard 0, then `source_receipt_proofs` for shard 0 will be of size 6 * 4MB = 24MB. That's way too big, the network probably won't be able to distribute that in time. And as we add more shards to the network, the effect will get worse and worse. This fix deals with the problem by allowing only one chosen shard to send large receipts to the other shard. All other shards are only allowed to send ~100kB of receipts. So instead of 6 shards sending 4MB, we end up with 5 shards sending 100kB and one shard sending 4MB, which adds up to 4.5MB, much more manageable. The mapping of "who is able to send a lot of outgoing receipts to whom" changes with every block height: ![image](https://github.com/near/nearcore/assets/149345204/3b571d7c-da24-4cd9-ad8f-19686fc0f055) In this example at block height 2: * shard 0 can send: * 100kB of receipts to shard 0 * 100kB of receipts to shard 1 * 4.5MB of receipts to shard 2 * shard 1 can send: * 4.5MB of receipts to shard 0 * 100kB of receipts to shard 1 * 100kB of receipts to shard 2 * shard 2 can send: * 100kB of receipts to shard 0 * 4.5MB of receipts to shard 1 * 100kB of receipts to shard 2 At every height a receiving shard will receive large receipts from only one shard, so the size of `source_receipt_proofs` stays under control. The mapping changes, so every large receipt will eventually be sent out when the mapping allows to send it to the destination. The new limits are: * `outgoing_receipts_usual_size_limit`: 102_400 (100kiB) * `outgoing_receipts_big_size_limit`: 4_718_592 (4.5MiB - a bit larger than the 4MiB receipt limit to make sure that 4MiB receipts can get through) ### Flaws This is a basic solution which has some flaws. It limits the witness size, but it affects throughput in certain scenarios. It can serve as a starting point for further improvements, something that we can get in before the code freeze. * Flaw 1: big receipts can block small receipts Shard tries to send outgoing receipts in the order in which they were created. When a large receipt is at the front of the queue, the shard won't send anything until it can send out this large receipt. This means that the shard might not send out anything for a few blocks. This could be fixed by having a separate queue for large outgoing receipts. * Flaw 2: missing chunks When a chunk is missing, the next chunk receives receipts from two block heights. This means that it could receive two 4MB receipts. This could be fixed by disallowing sending large receipts to shard that just had missing chunks ### TODO The implementation is pretty much ready, I should probably write some tests, but for now I have to do other stuff. Posting the PR as is for now.
In #11492 I implemented a basic receipt size limit using the approach where one chosen shard is allowed to send out more at each block height, while other shards wait for their turn. It limits the receipts, but there are some flaws. The first problem is that small receipts can be blocked by big receipts. If a small receipt is enqueued after a big one, then the small one will have to wait until the big one is sent out, which happens only once every Another problem is that the change might not play well with congestion control. Congestion control reacts aggresively to buffered receipts. Once there is 500 TGas worth of enqueued receipts, the shard is considered fully congested and cannot receive any more receipts. Having 500 TGas of enqueued receipts could become fairly common because of small receipts getting stuck behind the big ones. One way to solve the problem of small receipts getting stuck would be to have a separate queue for large receipts. But this would be problematic for the transaction priorities NEP. With transaction priorities we would like to send receipts with higher priority before the ones with low priority. What should happen when a big receipt has higher priority, but we're unable to send it at this block height? Should we send a smaller receipt with lower priority? Or wait for the big receipt? Two queues cause a headache there. There was also the question of whether we need this kind of limit at all. If we reduced the receipt size limit o 1.5 MiB, then with six shards we could receive at most 6*1.5 MiB = 9 MiB of incoming receipts, which sounds somewhat managable. There is also the concern of DoS attacks. What happens when someone submits a lot of large receipts aimed at a single shard? Cross shard throughput is limited, so it will cause a lot of congestion. I think that gas-based congestion control has to deal with the same problem, so maybe there can be some universal solution to discourage sending a lot of data to a single shard - higher gas prices or something like that. |
Because of the trouble with transaction priorities I became less enthusiastic about the two-queue solution. I think it would help in the short term, increasing the throughput and lowering latency in presence of large receipts, but it's incompatible with transaction priorities. But I have another idea that should work better: partial receipts |
There could be a lot of large incoming receipts, which could cause the
source_receipt_proofs
field insideChunkStateWitness
to be really large. There are known problems with distributing large instances ofChunkStateWitness
, so we should analyze how large the incoming receipts can be, and add a size limit to make sure that their size stays reasonable.Refs: #10259, #11103, #11274
/cc @wacban @Longarithm, I know you already spent some time thinking about incoming receipts.
I think we should at least look into it before stateless validation launch, to estimate how bad things can get.
The text was updated successfully, but these errors were encountered: