Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

Empty validator set returned by forkManager.GetValidators(height) when on second fork that is PoA #812

Open
dankostiuk opened this issue Oct 18, 2022 · 5 comments
Labels
in the pipeline Logged into our issue tracking pipeline

Comments

@dankostiuk
Copy link
Contributor

dankostiuk commented Oct 18, 2022

Empty validator set returned by forkManager.GetValidators(height) when on second fork that is PoA

Description

During testing of 0.60 on our 4-validator testnet, we found that we encounter a chain halt as soon as a non-genesis fork is reached.

While remote debugging after removing a validator's entire data-dir (including snapshots) and trying to re-sync, we observed that the forkManager.GetValidators(height) which is called from [ibft.Initialize() -> ibft.updateCurrentModules() -> ibft.getModulesFromForkManager()] returns an empty set []:

As a result, all 4 validators appear to be stuck in the discovery loop and can no longer build new blocks:

Oct 18 14:11:03 ip-172-31-93-217 main[746699]: 2022-10-18T14:11:03.465Z [DEBUG] polygon.network.discovery: Querying a peer for near peers: peer=16Uiu2HAmUrjw2e5sKM2JK7YMkKGE4Ft549wmYWEKCXamqnKhLH8B
Oct 18 14:11:03 ip-172-31-93-217 main[746699]: 2022-10-18T14:11:03.826Z [DEBUG] polygon.network.discovery: Found new near peers: peer=2
Oct 18 14:11:08 ip-172-31-93-217 main[746699]: 2022-10-18T14:11:08.503Z [DEBUG] polygon.network.discovery: Querying a peer for near peers: peer=16Uiu2HAmJ3jwuuFGeNi8GJB44QKfHjDwLcRfS4zRGBNbyLQF73tD
Oct 18 14:11:08 ip-172-31-93-217 main[746699]: 2022-10-18T14:11:08.704Z [DEBUG] polygon.network.discovery: Found new near peers: peer=2
Oct 18 14:11:13 ip-172-31-93-217 main[746699]: 2022-10-18T14:11:13.446Z [DEBUG] polygon.network.discovery: Querying a peer for near peers: peer=16Uiu2HAmUrjw2e5sKM2JK7YMkKGE4Ft549wmYWEKCXamqnKhLH8B
Oct 18 14:11:13 ip-172-31-93-217 main[746699]: 2022-10-18T14:11:13.606Z [DEBUG] polygon.network.discovery: Found new near peers: peer=2
Oct 18 14:11:18 ip-172-31-93-217 main[746699]: 2022-10-18T14:11:18.447Z [DEBUG] polygon.network.discovery: Querying a peer for near peers: peer=16Uiu2HAmUrjw2e5sKM2JK7YMkKGE4Ft549wmYWEKCXamqnKhLH8B
Oct 18 14:11:18 ip-172-31-93-217 main[746699]: 2022-10-18T14:11:18.647Z [DEBUG] polygon.network.discovery: Found new near peers: peer=2
Oct 18 14:11:23 ip-172-31-93-217 main[746699]: 2022-10-18T14:11:23.459Z [DEBUG] polygon.network.discovery: Querying a peer for near peers: peer=16Uiu2HAmUrjw2e5sKM2JK7YMkKGE4Ft549wmYWEKCXamqnKhLH8B
Oct 18 14:11:23 ip-172-31-93-217 main[746699]: 2022-10-18T14:11:23.580Z [DEBUG] polygon.network.discovery: Found new near peers: peer=2

We wipe our data-dir as shown below:

rm -rf data-dir/blockchain/ data-dir/trie/ data-dir/consensus/snapshots data-dir/consensus/metadata

Our genesis.json:

Note that we've modified our logic to allow for adjacent PoA ecdsa blocks as we have some custom PreCommitState logic that occurs on an interval defined by forkEpochSize -- we would expect the same issue to occur from PoS-ecdsa -> PoA-ecdsa.

 "ibft": {
                "epochSize": 100,
                "quorumSizeBlockNum": 5,
                "types": [
                    {
                        "type": "PoA",
                        "validator_type": "ecdsa",
                        "from": "0x0",
                        "to": "0x12b"
                    },
                    {
                        "type": "PoA",
                        "validator_type": "ecdsa",
                        "from": "0x12c",
                        "validators": [],
                        "forkEpochSize": 50
                    }
                ],
                "validator_type": "ecdsa"

Your environment

Steps to reproduce

  1. Spin up 4 validator testnet and have them build blocks, genesis should look like:
   "ibft": {
                "epochSize": 100,
                "quorumSizeBlockNum": 5,
                "type": "PoA",
                "validator_type": "ecdsa"
            }
  1. Call switch command to create PoA-ecdsa fork at block 300, genesis should look like:
 "ibft": {
                "epochSize": 100,
                "quorumSizeBlockNum": 5,
                "types": [
                    {
                        "type": "PoA",
                        "validator_type": "ecdsa",
                        "from": "0x0",
                        "to": "0x12b"
                    },
                    {
                        "type": "PoA",
                        "validator_type": "ecdsa",
                        "from": "0x12c",
                        "validators": [],
                        "forkEpochSize": 50
                    }
                ],
                "validator_type": "ecdsa"
            }
  1. Observe that as soon as block 0x12c (300) is reached, validators get stuck in discovery loop
  2. Upon attempting to wipe the data-dir and resync a single validator:
rm -rf data-dir/blockchain/ data-dir/trie/ data-dir/consensus/snapshots data-dir/consensus/metadata

We observe that the validator syncs to block 300 and then gets stuck in the discovery loop - remote debugging shows that the ValidatorStore at the second fork has an empty validator set.

Expected behaviour

I expect block building to continue across forks as it did when running our logic on v0.5.1.

Additionally, validators that completely remove their data-dir including their snapshots file should still be able to sync up the the latest block.

Actual behaviour

Block building no longer takes place once the new fork has been reached.

Proposed solution

Any time the validators returned from forkManager.GetValidators(height) is [], one could derive the validators from the header's ibftExtra, something similar to this:

	signer, validators, hooks, err := getModulesFromForkManager(i.forkManager, height)
	if err != nil {
		return err
	}

	if validators.Len() == 0 {
		validators, _ = signer.GetValidators(i.blockchain.Header())
	}

This is similar to what happens on the genesis fork -- if the snapshots data could not be found, the validatorSet is read from the genesis:

	if header.Number == 0 {
		// Genesis header needs to be set by hand, all the other
		// snapshots are set as part of processHeaders
		if err := s.addHeaderSnap(header); err != nil {
			return err
		}
	}
@dankostiuk dankostiuk changed the title Empty validator set return by forkManager.GetValidators(height) when on second fork that is PoA Empty validator set returned by forkManager.GetValidators(height) when on second fork that is PoA Oct 18, 2022
@laviniat1996
Copy link
Contributor

Hi @dankostiuk ,

Can you try this on develop too? I think it might be related to this issue we had, that was solved recently with this PR. Does this happen only in PoA, or in PoS too?

@dankostiuk
Copy link
Contributor Author

Hey @laviniat1996 ,

So the issue was that we forgot to include the --ibft-validator flags when defining our new fork. This wasn't obvious to us since we never had to do this pre-v0.6.0. We see now that block building continues as expected and that validators can sync up to the latest block without issues.

As the validator set can change at any time during a fork, we believe the logic around the snapshot validatorStore/forkManager could be improved to instead do the following:

  1. treat PoA fork validators field as optional, so that the validator set can instead be obtained from the header's ibftExtra (signer.GetValidators(header)) at the required block header if the snapshot's validatorSet is empty

If the above is not possible, I suppose the second best solution would be:

  1. upon creating new PoA fork using switch, automatically take latest/current validator set and include it within the fork so that we don't have to explicitly list all validator addresses every time we create a new fork

@laviniat1996
Copy link
Contributor

@dankostiuk Thank you for these suggestions. I will bring them up to the team, and we will get back to you. 🙏

@laviniat1996 laviniat1996 self-assigned this Oct 19, 2022
@laviniat1996
Copy link
Contributor

@dankostiuk Just wanted to give a quick update. Sorry for the waiting time. These suggestions were raised to the team, and they will go over them as soon as they can.

@laviniat1996 laviniat1996 added the in the pipeline Logged into our issue tracking pipeline label Oct 26, 2022
@ivanbozic21
Copy link

@dankostiuk
We still don't have an update about this request, but please know that we will continue to keep you posted.

@ivanbozic21 ivanbozic21 self-assigned this Dec 26, 2022
@ivanbozic21 ivanbozic21 removed their assignment Feb 9, 2023
@laviniat1996 laviniat1996 removed their assignment Feb 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
in the pipeline Logged into our issue tracking pipeline
Projects
None yet
Development

No branches or pull requests

3 participants