Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stateless_validation] Missing main transition state proof for old block #10964

Open
Tracked by #46
staffik opened this issue Apr 5, 2024 · 5 comments
Open
Tracked by #46
Labels
A-stateless-validation Area: stateless validation Near Core

Comments

@staffik
Copy link
Contributor

staffik commented Apr 5, 2024

When running stateless validation cluster with shard shuffling and with a lot of consecutive missing chunks.
Happened both on statelessnet and in adversenet, using 84.2 statelessnet protocol release.
The error looks like this:

panicked at chain/client/src/stateless_validation/state_witness_producer.rs:183:25:
Missing main transition state proof for block B4uA4cB4BEu6dmKCBz3bJMpvZSqyLXF84HWExo7dynJ9 and shard 3

The likely reason is that state transition data for chunk X was GC-ed after we switched shard, and after quite a long time we needed this data for chunk X.

@bowenwang1996
Copy link
Collaborator

I don't understand. With mainnet's epoch length, why would state witness for a chunk be needed more than 1 epoch later? We can assume that a chunk will be produced every epoch

@staffik
Copy link
Contributor Author

staffik commented Apr 12, 2024

I think it only can happen if we have a very long range of consecutive missing chunks. Something that won't probably happen on mainnet, but it happened on statelessnet.

@staffik staffik self-assigned this Apr 12, 2024
@Longarithm
Copy link
Member

We can assume that a chunk will be produced every epoch

@bowenwang1996 I don't feel confident about this for making design decisions...
With stateless validation, we are reducing number of CPs to only a few. So there is some probability of them colluding and just stopping producing chunks for the whole epoch. After that our assumption would break and I have no idea what will break in the whole chain :)
We've also seen some specific shard stalling in statelessnet due to technical issues AFAIU. Let's say some shard stalls on mainnet due to some weird bug (some chunk extra appears missing at all nodes, idk). If we don't debug this in a day, we end up with all chunks in epoch missing again.

I think we need some exact defensive mechanism against it. Like, don't finalise epoch until each shard has at least one chunk in it. But again, it doesn't help with attack above.
Or, if we are confident in mainnet validators, I think we still need some workaround for our test chains.
#11039 is also slightly relevant.

@walnut-the-cat
Copy link
Contributor

So there is some probability of them colluding and just stopping producing chunks for the whole epoch.

What are they getting by doing so? Doesn't this mean they won't get any reward and risking themselves of getting kicked out?

@bowenwang1996
Copy link
Collaborator

Like, don't finalise epoch until each shard has at least one chunk in it. But again, it doesn't help with attack above.

I think we should do that, but mainnet launch does not have to block on it. Also, why doesn't it help with the attack? Yes if all chunk producers collude they can prevent an epoch from ending but it doesn't serve them any benefit. Rather, they won't get any reward if an epoch doesn't end.

@staffik staffik removed their assignment Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation Near Core
Projects
None yet
Development

No branches or pull requests

4 participants