Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Register validator e2e test #1178

Conversation

Stefan-Ethernal
Copy link
Collaborator

@Stefan-Ethernal Stefan-Ethernal commented Jan 31, 2023

Description

This PR fixes register validator e2e test. It was failing due to 2 reasons, which are described below.

Misplaced NextValidatorsHash query

Register validator e2e test is flaky, because we are querying for ExtraData and NextValidatorsHash before we make sure that newly registered validator is joined to the consensus protocol.

Proposed solution consists of querying block and check next validators hash only after we make sure that validator is present in new validator set.

Newly registered validator receives 0 for rewards

Another problem why this test was failing was because of new validator's rewards. The validator rewards were 0 even after the validator is registered, and jointed validator set and few epochs are passed. This happened because validator never signed any block, and it was not present in the uptime data. The cause why validator wasn't signing is because FSM calculation was never cancelled on receiving fresh blocks and FSM was working with the 'stale' data.

In other words, this occurred:

  1. Validator joins validator set, FSM is initialized, and the sequence is run for the new height (e.g. 15)
  2. In the meantime, other nodes moved to the new proposal height 16
  3. HasQuorum fn is called for the proposal msg on our new validator and the block number 16 is propagated as a new height. This call returns false because in c.fsm.proposerSnapshot.GetLatestProposer there is a check that compares height from the FSM against received block number which is not the same since FSM is stayed on height 15 and proposal moved on. We can see this in the validator's log
    "get latest proposer not found - height: 16, round: 0, pc height: 15, pc round: 0"
  4. Validator keeps getting new blocks from the syncer which should trigger cancelling the current sequence and starting a new one, but that never happens because of the bug
  5. The check in polybft.go that should write in the syncerBlockCh and cause sequence cancellation is following:
if ev.Source == "syncer" && ev.NewChain[0].Number > p.blockchain.CurrentHeader().Number {
	syncerBlockCh <- struct{}{}
}

The thing here is that syncer will call WriteFullBlock for the newly received block, and first it will write the block in the blockchain and then in the same fn calls b.dispatchEvent(evnt) which results in generating an event about the inserted block. But since the block is already inserted, ev.NewChain[0].Number and p.blockchain.CurrentHeader().Number will be the same.

Proposed solution:
The fix for this issue is also included in this PR and allows sequence cancellation also in the case when the syncer block height is equal to the current height from the blockchain.
The FSM at that moment might be calculating block for the lower or the same height, and it is ok to be interrupted since write from the consensus would not happen cause syncer already took the lock and wrote the block for that height. FSM calculation for the newer block then the saved one, can be interrupted only if e.g. node becomes the validator and the new sequence is called after the syncer wrote the block(with the current height+1), but before the event is handled. In that case the calculation for the new block will be interrupted but also new sequence will be started again.

Additional comments

E2E test is still going to fail occasionally (although more rarely than it is currently) with validators' hash mismatch until #1191 gets merged.

Changes include

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)

Checklist

  • I have assigned this PR to myself
  • I have added at least 1 reviewer
  • I have added the relevant labels
  • I have updated the official documentation
  • I have added sufficient documentation in code

Testing

  • I have tested this code with the official test suite
  • I have tested this code manually

@Stefan-Ethernal Stefan-Ethernal self-assigned this Jan 31, 2023
@Stefan-Ethernal Stefan-Ethernal added the bug fix Functionality that fixes a bug label Jan 31, 2023
@codecov
Copy link

codecov bot commented Jan 31, 2023

Codecov Report

Merging #1178 (3a064f9) into develop (f434a9b) will increase coverage by 0.00%.
The diff coverage is 0.00%.

❗ Current head 3a064f9 differs from pull request most recent head 860c4cd. Consider uploading reports for the commit 860c4cd to get more accurate results

@@           Coverage Diff            @@
##           develop    #1178   +/-   ##
========================================
  Coverage    54.68%   54.68%           
========================================
  Files          176      176           
  Lines        23553    23547    -6     
========================================
- Hits         12880    12877    -3     
+ Misses        9645     9641    -4     
- Partials      1028     1029    +1     
Impacted Files Coverage Δ
consensus/polybft/extra.go 85.82% <0.00%> (+0.67%) ⬆️
consensus/polybft/fsm.go 69.75% <0.00%> (+0.56%) ⬆️
syncer/client.go 61.79% <0.00%> (-1.42%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Base automatically changed from feature/v3-parity to develop February 6, 2023 07:53
@stana-miric stana-miric force-pushed the EVM-434-fix-test-e-2-e-consensus-register-validator-e-2-e-test branch from c022ea0 to 3a064f9 Compare February 6, 2023 09:35
@stana-miric stana-miric marked this pull request as ready for review February 7, 2023 07:45
Copy link
Contributor

@vcastellm vcastellm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM good catch! 🎩

@Stefan-Ethernal
Copy link
Collaborator Author

Good catch 🔥

P.s. not able to approve it, because I'm a creator of PR

@stana-miric stana-miric merged commit 8e3646e into develop Feb 7, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Feb 7, 2023
@Stefan-Ethernal Stefan-Ethernal deleted the EVM-434-fix-test-e-2-e-consensus-register-validator-e-2-e-test branch February 7, 2023 14:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug fix Functionality that fixes a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants