failed to get validators when calculation quorum: height=151 err="not found snapshot" #734

NicoDFS · 2022-09-14T22:45:32Z

not found snapshot

Description

I am getting an error "failed to get validators when calculation quorum: height=151 err="not found snapshot" when setting up local POA network (not using -pos flag). I have tried starting the network in combos of 4, 5 and 6 nodes as valadators. Each time at block 151 the chain stops and i get the error.

This happens at block 151. If I stop all the nodes and start them again the chain will run again until block 301.

I never see this error when adding --pos to the genesis, the chain will run for thousands of blocks with no issue or error but it seems to me that the network should be started in POA and upgraded to POS when its large enough. Maybe Im wrong in that thinking? Or maybe my genesis command is missing something?

My environment

Ubuntu 18
Polygon Edge v 0.5.0
Development Branch

Steps to reproduce

setup local chain as per https://docs.polygon.technology/docs/edge/get-started/set-up-ibft-locally
Genesis setup like polygon-edge genesis --consensus ibft --ibft-validators-prefix-path test-chain- --bootnode /ip4/127.0.0.1/tcp/10001/p2p/16Uiu2HAmFaPaZFepxjUohvudCN15TBskCzusVHt6xbhoHJvWLCru --bootnode /ip4/127.0.0.1/tcp/20001/p2p/16Uiu2HAmGTAVAB7n4xJbbQdJK1UiFuBRDHaZt1GamS1PN1nYRXLj --premine=<myaddress>:100000000000000000000000000 --epoch-size 50 --name edge-testnet --chain-id 1717
Start all 4 nodes
Error comes at block 151 and then again at 301

The text was updated successfully, but these errors were encountered:

laviniat1996 · 2022-09-14T23:33:42Z

Hello @NicoDFS ,

Thank you for opening this issue!

Can you please provide the logs too? They would help speed up the troubleshooting process. Also, which commands did you use to start the nodes?

NicoDFS · 2022-09-14T23:50:09Z

Hi @laviniat1996 thanks for getting back with me. I started the nodes with

edge server --data-dir ./test-chain-2 --chain genesis.json --grpc-address :20000 --libp2p :20001 --jsonrpc :20002 --seal
edge server --data-dir ./test-chain-3 --chain genesis.json --grpc-address :30000 --libp2p :30001 --jsonrpc :30002 --seal
edge server --data-dir ./test-chain-4 --chain genesis.json --grpc-address :40000 --libp2p :40001 --jsonrpc :40002 --seal```

I didnt think to start with the log flag but these are what the LOG files look like from test-chain-1/blockchain

```=============== Sep 14, 2022 (PDT) ===============
15:12:47.240205 log@legend F·NumFile S·FileSize N·Entry C·BadEntry B·BadBlock Ke·KeyError D·DroppedEntry L·Level Q·SeqNum T·TimeElapsed
15:12:47.241839 db@open opening
15:12:47.242000 version@stat F·[] S·0B[] Sc·[]
15:12:47.242573 db@janitor F·2 G·0
15:12:47.242590 db@open done T·740.167µs
15:33:35.197606 db@close closing
15:33:35.197708 db@close done T·102.361µs
=============== Sep 14, 2022 (PDT) ===============
15:33:49.688600 log@legend F·NumFile S·FileSize N·Entry C·BadEntry B·BadBlock Ke·KeyError D·DroppedEntry L·Level Q·SeqNum T·TimeElapsed
15:33:49.688687 version@stat F·[] S·0B[] Sc·[]
15:33:49.688719 db@open opening
15:33:49.688759 journal@recovery F·1
15:33:49.689849 journal@recovery recovering @1
15:33:49.692398 memdb@flush created L0@2 N·1055 S·109KiB "b\x00\x82..\x8aF\x95,v419":"r\xfe\x0e..\x9fR\x1c,v922"
15:33:49.692605 version@stat F·[1] S·109KiB[109KiB] Sc·[0.25]
15:33:49.695019 db@janitor F·3 G·0
15:33:49.695052 db@open done T·6.324799ms
15:46:43.640539 db@close closing
15:46:43.640628 db@close done T·88.639µs
=============== Sep 14, 2022 (PDT) ===============
15:46:50.601678 log@legend F·NumFile S·FileSize N·Entry C·BadEntry B·BadBlock Ke·KeyError D·DroppedEntry L·Level Q·SeqNum T·TimeElapsed
15:46:50.601775 version@stat F·[1] S·109KiB[109KiB] Sc·[0.25]
15:46:50.601786 db@open opening
15:46:50.601819 journal@recovery F·1
15:46:50.601906 journal@recovery recovering @3
15:46:50.604626 memdb@flush created L0@5 N·1050 S·110KiB "b\x02\t..\xdd\xc7\x11,v1603":"r\xff6...Ǯ,v1861"
15:46:50.604767 version@stat F·[2] S·219KiB[219KiB] Sc·[0.50]
15:46:50.607226 db@janitor F·4 G·0
15:46:50.607265 db@open done T·5.472512ms
15:55:07.978365 db@close closing
15:55:07.978473 db@close done T·107.39µs```

Thanks

NicoDFS · 2022-09-15T13:39:57Z

@laviniat1996 That seems to have worked, thank you. I question please before we close this. In a real world depolyment when a network upgrades from POA to POS by running ibft switch command can the --epoch-size flag be used? Thank you

laviniat1996 · 2022-09-15T14:21:01Z

Hi @NicoDFS , I have deleted my previous comment because it contained some bad information. I have been able to reproduce the issue. This is an actual bug and we are looking into it. Every third epoch, the block production seems to stop. Recreating the genesis file with the epoch flag removed solved the issue because the default epoch value is 100000, so the chain didn't reach that value.

In the real world deployment, recreating the genesis file is not an option, and the --epoch-size can be added only to genesis, so not when you make the PoA->PoS switch. Also, this flag shouldn't affect the block production in PoA,

Thank you so much for letting us know about this issue! We will give you updates about the progress regarding this.

NicoDFS · 2022-09-15T14:33:10Z

Thanks @laviniat1996

ivanbozic21 · 2022-10-07T15:58:50Z

@NicoDFS apology for the late response, and we will have an update for you sometimes next week.

zivkovicmilos · 2022-10-11T19:44:29Z

Hey @NicoDFS,

Apologies for the long wait on this fix rollout - I've opened up a PR that resolves this problem, and linked it to this issue.

Thank you for alerting us about it 🙏

ivanbozic21 added the investigating This behavior is still being tested out label Sep 15, 2022

laviniat1996 added bug Something isn't working in the pipeline Logged into our issue tracking pipeline labels Sep 16, 2022

ivanbozic21 closed this as completed Oct 3, 2022

ivanbozic21 reopened this Oct 3, 2022

zivkovicmilos assigned 0xAleksaOpacic Oct 7, 2022

zivkovicmilos removed the bug Something isn't working label Oct 7, 2022

zivkovicmilos assigned dbrajovic Oct 7, 2022

zivkovicmilos assigned zivkovicmilos and unassigned 0xAleksaOpacic and dbrajovic Oct 11, 2022

zivkovicmilos added bug Something isn't working and removed investigating This behavior is still being tested out labels Oct 11, 2022

zivkovicmilos linked a pull request Oct 11, 2022 that will close this issue

Resolve invalid snapshot pruning #791

Merged

11 tasks

zivkovicmilos closed this as completed in #791 Oct 13, 2022

laviniat1996 mentioned this issue Oct 18, 2022

Empty validator set returned by forkManager.GetValidators(height) when on second fork that is PoA #812

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed to get validators when calculation quorum: height=151 err="not found snapshot" #734

failed to get validators when calculation quorum: height=151 err="not found snapshot" #734

NicoDFS commented Sep 14, 2022

laviniat1996 commented Sep 14, 2022

NicoDFS commented Sep 14, 2022

NicoDFS commented Sep 15, 2022

laviniat1996 commented Sep 15, 2022

NicoDFS commented Sep 15, 2022

ivanbozic21 commented Oct 7, 2022

zivkovicmilos commented Oct 11, 2022

failed to get validators when calculation quorum: height=151 err="not found snapshot" #734

failed to get validators when calculation quorum: height=151 err="not found snapshot" #734

Comments

NicoDFS commented Sep 14, 2022

not found snapshot

Description

My environment

Steps to reproduce

laviniat1996 commented Sep 14, 2022

NicoDFS commented Sep 14, 2022

NicoDFS commented Sep 15, 2022

laviniat1996 commented Sep 15, 2022

NicoDFS commented Sep 15, 2022

ivanbozic21 commented Oct 7, 2022

zivkovicmilos commented Oct 11, 2022