Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Don't raise a dispute on pre-validation and similar candidate unrelated errors #6057

Closed
3 of 4 tasks
eskimor opened this issue Sep 26, 2022 · 7 comments · Fixed by #6103
Closed
3 of 4 tasks

Don't raise a dispute on pre-validation and similar candidate unrelated errors #6057

eskimor opened this issue Sep 26, 2022 · 7 comments · Fixed by #6103
Assignees
Labels
T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes.
Milestone

Comments

@eskimor
Copy link
Member

eskimor commented Sep 26, 2022

Apparently disputes are raised due to this error:

Detected invalid candidate as an approval checker. reason=ExecutionError(\"prevalidation: Other(\\\"cannot deserialize module: InvalidMagic\\\")\")

With pre-checking in place, we should not raise disputes in case of such errors as disputes are about invalid candidates - at this point we have not even looked at the candidate, thus we can't know whether it is valid or not. With pre-checking enabled we should be reasonably sure, that such errors will not be a common problem and Polkadot should just quit, so the operator is forced to fix the problem, alternatively just not vote.

Obviously if an attacker could find a way to trigger such problems from the outside, that would be bad - but I think we just have to prevent that from being possible. Raising disputes is certainly not the solution, as this would lead to the validator getting disabled, which would be similarly beneficial to the attacker.

With pre-checking in place, we should just assume that this is an operational issue and do our best to have the operator fix it.

Error conditions currently triggering a dispute to reconsider

Fixing ambiguous worker death disputes is handled in these tickets:
#6041
paritytech/polkadot-sdk#767

@eskimor eskimor moved this to To do in Parachains-core Sep 26, 2022
@eskimor eskimor added this to the slashing milestone Sep 26, 2022
@eskimor eskimor added the T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes. label Sep 26, 2022
@tugytur
Copy link
Contributor

tugytur commented Sep 27, 2022

After switching a validator to a new server we had prevalidation related errors.
The errors were within the first 12 hours of becoming an active validator but not on the first Paravalidator session.
All the disputes were within the same session and only that one session.

The validator is running on 0.9.29, rocksdb and the db was synced from scratch with version 0.9.28.

Different kind of unique error logs:

Failed to validate candidate para_id=Id(2007) error=InvalidCandidate(PrepareError("prevalidation: Other(\"cannot deserialize module: InvalidMemoryReference(3)\")"))
Detected invalid candidate as an approval checker. reason=ExecutionError("prevalidation: Other(\"cannot deserialize module: InvalidMemoryReference(3)\")") candidate_hash=#### para_id=Id(2007) traceID=####
New dispute initiated for candidate. candidate_hash=#### session=#### traceID=####

Failed to validate candidate para_id=Id(2000) error=InvalidCandidate(PrepareError("prevalidation: Other(\"cannot deserialize module: UnknownOpcode(24)\")"))
Detected invalid candidate as an approval checker. reason=ExecutionError("prevalidation: Other(\"cannot deserialize module: UnknownOpcode(24)\")") candidate_hash=#### para_id=Id(2000) traceID=####
New dispute initiated for candidate. candidate_hash=#### session=#### traceID=####
Dispute on candidate concluded with 'valid' result candidate_hash=#### session=#### traceID=####

Invalid candidate (basic checks) para_id=Id(2102)
Detected invalid candidate as an approval checker. reason=CodeHashMismatch candidate_hash=#### para_id=Id(2102) traceID=####
New dispute initiated for candidate. candidate_hash=#### session=24592 traceID=####

I had to replace identifiable information to not be in breach of confidentiality.

Let me know if you would like to have this information, I should be able to reproduce it with one of our public validators where I can share all the logs.

@eskimor
Copy link
Member Author

eskimor commented Sep 28, 2022

Thank you @tugytur ! This is very valuable feedback, we will look into it. What does switching to a new server mean exactly? Anything suspicious in logs/syslogs at the time of the incident/switch?

@eskimor
Copy link
Member Author

eskimor commented Sep 28, 2022

@pepyakin any ideas what could be causing such errors?

@pepyakin
Copy link
Contributor

So this one apparently happens during prevalidation, which means it comes from a preparation worker (here). As can be seen from the logs, it receives some random-ish garbage.

The communication between the preparation worker and the PVF host happens through a UDS socket. I don't think any corruption can happen for the data in flight there.

@tugytur
Copy link
Contributor

tugytur commented Sep 28, 2022

Thank you @tugytur ! This is very valuable feedback, we will look into it. What does switching to a new server mean exactly? Anything suspicious in logs/syslogs at the time of the incident/switch?

@eskimor

It's a freshly configured server that was never used as a validator.
The fail-over was done via setting new session keys.

Snapshot restore was from a sync node that doesn't use the --validator flag.
Therefore no data was in the parachain folder until the server was started as a validator.

There were no indications in the logs to any issues in that time period, the server passes all hardware requirements and has nothing else besides the validator running on it.

@eskimor
Copy link
Member Author

eskimor commented Sep 28, 2022

Have you used the provided polkadot binary or compiled it yourself?

@tugytur
Copy link
Contributor

tugytur commented Sep 28, 2022

Have you used the provided polkadot binary or compiled it yourself?

I've used the provided binary.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants