Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not re-prepare PVFs if not needed #4211
Do not re-prepare PVFs if not needed #4211
Changes from 4 commits
e96e9b5
8897d8d
065aac2
a4ec27c
b6f5ecb
9c1778a
7ab8ba4
0c6afcd
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks brittle wrt adding new parameters. I would rewrite and use
!matches
such that any new parameter added is considered preparation related unless we explicitly name it here as non-preparation related.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an obvious improvement I've overseen, thank you for the suggestion!
Another question I'm trying to answer: is
PvfPrepTimeout
a parameter that affects preparation indeed?That may be a bit counterintuitive, but I'm more inclined to say "no" than "yes". We have a strict pre-checking timeout, and we never re-pre-check, even if executor environment params change. The preparation timeout is lenient, and anyway, those timeouts are non-deterministic. So I don't really see much sense in re-preparing artifacts if this timeout changes (especially given that we're not going to ever decrease them). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just try to see what would happen if we say no. Let's first pick the likely more troublesome situation, we for some reason reduce the timeout:
Artifacts are not being recompiled, but a node that restarts will and may now hit a timeout it did not hit before. If I am not mistaken, the node will not dispute, but simply don't vote - right? Now, if that happens on just a few nodes, not too much harm done.
Now, if this is a general problem, everytime a node restarts, we have another machine not voting. So we might run into finality issues.
Problems:
Now, let's flip it and let's say "yes". Now all validators will recompile immediately, including backers. I am assuming that anything that got backed with the old artifact, will also be approved and disputed with the old artifact (this is true, right?). In that case the situation would be way better, because now we only have a single parachain that will completely cease to make progress, but all other paras and the relay chain won't be affected at all. Only the backers for that para will try to prepare and fail, as long as they are assigned.
If we assume we will only ever increase that timeout, then of course this should be fine, but is this a sound assumption? Why would we change the timeout at all?
Couldn't say, which scenario is more likely. That we are never ever re-prechecking is more a bug than anything. What we should actually be doing is re-prechecking and only enacting the new paras, once this was successful. That would also avoid the finality issues on changing the parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe so. Although we treat the preparation timeout as a deterministic error (as it hasn't failed during the pre-check), we don't dispute.
We should always use execution parameters from the session where the candidate was produced, so yes, they should always be executed with the same artifact.
Why do you think it's only a single parachain? The timeout is per-network per-session, so after passing the session boundary where this parameter change is enacted, all the artifacts should be re-prepared, as we saw during the Kusama and Polkadot incidents.
I remember we talked about decreasing the timeouts as not being safe, as it may make PVFs that are already on-chain fail. At the very least, we should have some tooling that could be used to check all the already-existing PVFs not to fail with the new set of executor parameters on the reference hardware (I think @ordian was working on that one?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tooling we do have, but currently it's not integrated into the release process. But even if it could be used to check the current PVFs at the time of the release, one could imagine an attacker uploading a new PVF (with on-demand core) right after the release that will be on the edge of the old time limit. So it doesn't solve the problem by itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rerunning pre-checking before enactment would.
Not necessarily a single one, but most certainly not all of them. Otherwise we would have been really stupid with the parameters.
True, but I think this should be fixed. Without pre-checking we get a finality stall, that alone is reason enough to do it properly and not enact parameters that have not been checked. Like, assuming we want to change these parameters at all: Why would we only ever want to increase? Most likely we need to increase to mitigate some issue, but would actually want to decrease again later, once it is fixed. Machines also get faster, compilers get better, so the chances that we actually want to decrease at some point are likely higher that we would want to increase.
In other words, by saying "no" we dig ourselves deeper into a solution that is actually not sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's not true, is it ?
On approval voting we fetch executor params based on the block the candidate is included: https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/node/core/approval-voting/src/lib.rs#L2967, .
On backing we fetch them based on the relay_parent:
https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/node/core/backing/src/lib.rs#L666
I don't think there is anything preventing the two relay chain blocks being in different sessions at the boundary, especially with async backing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, that may be an important find... If that does not hold, it should be fixed. The very purport of the executor parameters is to always use the same set of parameters with the same candidate.