Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator Re-Enabling #5724

Merged
merged 57 commits into from
Nov 19, 2024
Merged

Validator Re-Enabling #5724

merged 57 commits into from
Nov 19, 2024

Conversation

Overkillus
Copy link
Contributor

@Overkillus Overkillus commented Sep 16, 2024

Aims to implement Stage 3 of Validator Disbling as outlined here: #4359

Features:

  • New Disabling Strategy (Staking level)
  • Re-enabling logic (Session level)
  • More generic disabling decision output
  • New Disabling Events

Testing & Security:

  • Unit tests
  • Mock tests
  • Try-runtime checks
  • Try-runtime tested on westend snap
  • Try-runtime CI tests
  • Re-enabling Zombienet Test (?)
  • SRLabs Audit

Closes #4745
Closes #2418

@Overkillus Overkillus added I1-security The node fails to follow expected, security-sensitive, behaviour. T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Sep 16, 2024
@Overkillus Overkillus self-assigned this Sep 16, 2024
Copy link
Contributor

@tdimitrov tdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick pass by focusing mainly on the approach. It looks good, nice work @Overkillus!

I've left some thoughts about a corner case with the re-enabling.

substrate/frame/staking/src/lib.rs Outdated Show resolved Hide resolved
substrate/frame/staking/src/slashing.rs Outdated Show resolved Hide resolved
substrate/frame/staking/src/slashing.rs Outdated Show resolved Hide resolved
@Overkillus Overkillus changed the title Validator Re-Enabling (master PR) Validator Re-Enabling Sep 24, 2024
@Overkillus Overkillus marked this pull request as ready for review September 30, 2024 09:33
@Overkillus
Copy link
Contributor Author

Overkillus commented Nov 11, 2024

@Ank4n

I wouldn't try to block you if you want to go ahead with this PR, especially since most of the things I flagged should ideally had been flagged with the earlier PRs, and this PR does not introduce a new design decision.

That is what I would suggest. This impl is simply a part of an earlier design which was pre-approved by SRLabs in the design's audit. It just adds some new functionality using the same infrastructure without altering it too much.

Since this PR touches the logic that we know we will have to migrate in next couple of months, it is reasonable enough enough to make those changes in this PR.

We have the power to open multiple PRs to separate unrelated changes between the PRs. Enabling new functionality in the current design should not come with a major refactor if it is not neccesary. I am open to the refactor at a later stage but nevertheless it should be a separate PR.

This PR does not make the future refactor harder or easier. I would understand withholding the change if it made the refactor harder. It does not make it harder and is orthogonal.

On top of all of that this PR is a security-related fix. It has a higher priority than a genral refactor and should not be budled with it without a good reason. It might only obfuscate the auditing process.

The rest of the discussion dives into why the refactor might be a good idea later on which we should separate into a separate issue or ticket. I'd be happy to participate in those discussions and help as much as I can.

@tdimitrov
Copy link
Contributor

We have the power to open multiple PRs to separate unrelated changes between the PRs. Enabling new functionality in the current design should not come with a major refactor if it is not neccesary. I am open to the refactor at a later stage but nevertheless it should be a separate PR.

I agree with @Overkillus here. Considering that this PR is part of the disabling strategy roll out I am strongly against doing any out of scope re-factorings.

What @Ank4n suggests definitely makes sense and we should do it but as a separate effort. Let's focus on the disabling strategy in this PR.

@sandreim
Copy link
Contributor

Since this PR touches the logic that we know we will have to migrate in next couple of months, it is reasonable enough enough to make those changes in this PR.

Given that this PR doesn't make it harder to do the migration in the future, your suggestion is best handled as separate task/PR. In general I think it is not a good idea to require unrelated refactorings in the scope of a change that is concerned with security.

@Ank4n please take another look and if there are no other causes of concern I would say we should merge. We want to get this change in production as soon as possible.

@Ank4n
Copy link
Contributor

Ank4n commented Nov 12, 2024

@tdimitrov @sandreim The refactoring needed is not unrelated. But blocking also serves no purpose, and we can handle this in a followup issue.

@Overkillus Could you check if its possible to get rid of the storage item DisabledValidators in pallet-staking and only maintain it in pallet-session (even if it results in a bit messy code)? This would make the followup refactor noop. Otherwise, I'm ready to approve.

@gpestana You may want to take a quick look at the changes as well.

@Overkillus
Copy link
Contributor Author

Added all the defensive suggestions from @Ank4n Good eye for spotting them, thanks!

Could you check if its possible to get rid of the storage item DisabledValidators in pallet-staking and only maintain it in pallet-session (even if it results in a bit messy code)?

This is not trivial. 99% of the disabling logic lives in staking so moving it over is not easy and this is a major part of the needed refactor.

polkadot/runtime/westend/src/lib.rs Outdated Show resolved Hide resolved
@Ank4n Ank4n requested a review from gpestana November 16, 2024 07:15
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/11892215786
Failed job name: fmt

@Overkillus Overkillus added this pull request to the merge queue Nov 18, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 18, 2024
@alvicsam alvicsam added this pull request to the merge queue Nov 18, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 18, 2024
@Overkillus
Copy link
Contributor Author

Tests in CI failed on test-linux-stable (1/3, parity-large-persistent):

   FLAKY 4/6 [  31.550s] pallet-revive-eth-rpc tests::deploy_and_call
   FLAKY 2/6 [   0.107s] polkadot-dispute-distribution tests::send_dispute_gets_cleaned_up
   FLAKY 2/6 [ 126.884s] polkadot-node-core-pvf::it execute_job_terminates_on_timeout
  TRY 6 FAIL [  30.260s] pallet-revive-eth-rpc tests::native_evm_ratio_works
  TRY 6 FAIL [   2.420s] sc-rpc-spec-v2 chain_head::tests::ensure_operation_limits_works
  • eth-rpc tests (deploy_and_call & native_evm_ratio_works) seem to be very flaky overall, native_evm_ratio_works always fails locally (0/30)
  • send_dispute_gets_cleaned_up is a bit weirder, passing locally with no fail (100/100)
  • execute_job_terminates_on_timeout, always passes locally (100/100)
  • ensure_operation_limits_works, always passes locally (100/100)

Looking into flaking tests further.

@Overkillus Overkillus added this pull request to the merge queue Nov 19, 2024
Merged via the queue into master with commit 8d4138f Nov 19, 2024
190 of 198 checks passed
@Overkillus Overkillus deleted the mkz-re-enabling branch November 19, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I1-security The node fails to follow expected, security-sensitive, behaviour. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
Status: Backlog
Status: Completed
6 participants