-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Don't raise a dispute on pre-validation and similar candidate unrelated errors #6057
Comments
After switching a validator to a new server we had The validator is running on 0.9.29, rocksdb and the db was synced from scratch with version 0.9.28. Different kind of unique error logs:
I had to replace identifiable information to not be in breach of confidentiality. Let me know if you would like to have this information, I should be able to reproduce it with one of our public validators where I can share all the logs. |
Thank you @tugytur ! This is very valuable feedback, we will look into it. What does switching to a new server mean exactly? Anything suspicious in logs/syslogs at the time of the incident/switch? |
@pepyakin any ideas what could be causing such errors? |
So this one apparently happens during prevalidation, which means it comes from a preparation worker (here). As can be seen from the logs, it receives some random-ish garbage. The communication between the preparation worker and the PVF host happens through a UDS socket. I don't think any corruption can happen for the data in flight there. |
It's a freshly configured server that was never used as a validator. Snapshot restore was from a sync node that doesn't use the There were no indications in the logs to any issues in that time period, the server passes all hardware requirements and has nothing else besides the validator running on it. |
Have you used the provided polkadot binary or compiled it yourself? |
I've used the provided binary. |
Apparently disputes are raised due to this error:
With pre-checking in place, we should not raise disputes in case of such errors as disputes are about invalid candidates - at this point we have not even looked at the candidate, thus we can't know whether it is valid or not. With pre-checking enabled we should be reasonably sure, that such errors will not be a common problem and Polkadot should just quit, so the operator is forced to fix the problem, alternatively just not vote.
Obviously if an attacker could find a way to trigger such problems from the outside, that would be bad - but I think we just have to prevent that from being possible. Raising disputes is certainly not the solution, as this would lead to the validator getting disabled, which would be similarly beneficial to the attacker.
With pre-checking in place, we should just assume that this is an operational issue and do our best to have the operator fix it.
Error conditions currently triggering a dispute to reconsider
Fixing ambiguous worker death disputes is handled in these tickets:
#6041
paritytech/polkadot-sdk#767
The text was updated successfully, but these errors were encountered: