Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

failed to get validators when calculation quorum: height=151 err="not found snapshot" #734

Closed
NicoDFS opened this issue Sep 14, 2022 · 7 comments · Fixed by #791
Closed
Assignees
Labels
bug Something isn't working in the pipeline Logged into our issue tracking pipeline

Comments

@NicoDFS
Copy link

NicoDFS commented Sep 14, 2022

not found snapshot

Description

I am getting an error "failed to get validators when calculation quorum: height=151 err="not found snapshot" when setting up local POA network (not using -pos flag). I have tried starting the network in combos of 4, 5 and 6 nodes as valadators. Each time at block 151 the chain stops and i get the error.

This happens at block 151. If I stop all the nodes and start them again the chain will run again until block 301.

I never see this error when adding --pos to the genesis, the chain will run for thousands of blocks with no issue or error but it seems to me that the network should be started in POA and upgraded to POS when its large enough. Maybe Im wrong in that thinking? Or maybe my genesis command is missing something?

My environment

  • Ubuntu 18
  • Polygon Edge v 0.5.0
  • Development Branch

Steps to reproduce

  • setup local chain as per https://docs.polygon.technology/docs/edge/get-started/set-up-ibft-locally
  • Genesis setup like polygon-edge genesis --consensus ibft --ibft-validators-prefix-path test-chain- --bootnode /ip4/127.0.0.1/tcp/10001/p2p/16Uiu2HAmFaPaZFepxjUohvudCN15TBskCzusVHt6xbhoHJvWLCru --bootnode /ip4/127.0.0.1/tcp/20001/p2p/16Uiu2HAmGTAVAB7n4xJbbQdJK1UiFuBRDHaZt1GamS1PN1nYRXLj --premine=<myaddress>:100000000000000000000000000 --epoch-size 50 --name edge-testnet --chain-id 1717
  • Start all 4 nodes
  • Error comes at block 151 and then again at 301
@laviniat1996
Copy link
Contributor

Hello @NicoDFS ,

Thank you for opening this issue!

Can you please provide the logs too? They would help speed up the troubleshooting process. Also, which commands did you use to start the nodes?

@NicoDFS
Copy link
Author

NicoDFS commented Sep 14, 2022

Hi @laviniat1996 thanks for getting back with me. I started the nodes with

edge server --data-dir ./test-chain-2 --chain genesis.json --grpc-address :20000 --libp2p :20001 --jsonrpc :20002 --seal
edge server --data-dir ./test-chain-3 --chain genesis.json --grpc-address :30000 --libp2p :30001 --jsonrpc :30002 --seal
edge server --data-dir ./test-chain-4 --chain genesis.json --grpc-address :40000 --libp2p :40001 --jsonrpc :40002 --seal```

I didnt think to start with the log flag but these are what the LOG files look like from test-chain-1/blockchain

```=============== Sep 14, 2022 (PDT) ===============
15:12:47.240205 log@legend F·NumFile S·FileSize N·Entry C·BadEntry B·BadBlock Ke·KeyError D·DroppedEntry L·Level Q·SeqNum T·TimeElapsed
15:12:47.241839 db@open opening
15:12:47.242000 version@stat F·[] S·0B[] Sc·[]
15:12:47.242573 db@janitor F·2 G·0
15:12:47.242590 db@open done T·740.167µs
15:33:35.197606 db@close closing
15:33:35.197708 db@close done T·102.361µs
=============== Sep 14, 2022 (PDT) ===============
15:33:49.688600 log@legend F·NumFile S·FileSize N·Entry C·BadEntry B·BadBlock Ke·KeyError D·DroppedEntry L·Level Q·SeqNum T·TimeElapsed
15:33:49.688687 version@stat F·[] S·0B[] Sc·[]
15:33:49.688719 db@open opening
15:33:49.688759 journal@recovery F·1
15:33:49.689849 journal@recovery recovering @1
15:33:49.692398 memdb@flush created L0@2 N·1055 S·109KiB "b\x00\x82..\x8aF\x95,v419":"r\xfe\x0e..\x9fR\x1c,v922"
15:33:49.692605 version@stat F·[1] S·109KiB[109KiB] Sc·[0.25]
15:33:49.695019 db@janitor F·3 G·0
15:33:49.695052 db@open done T·6.324799ms
15:46:43.640539 db@close closing
15:46:43.640628 db@close done T·88.639µs
=============== Sep 14, 2022 (PDT) ===============
15:46:50.601678 log@legend F·NumFile S·FileSize N·Entry C·BadEntry B·BadBlock Ke·KeyError D·DroppedEntry L·Level Q·SeqNum T·TimeElapsed
15:46:50.601775 version@stat F·[1] S·109KiB[109KiB] Sc·[0.25]
15:46:50.601786 db@open opening
15:46:50.601819 journal@recovery F·1
15:46:50.601906 journal@recovery recovering @3
15:46:50.604626 memdb@flush created L0@5 N·1050 S·110KiB "b\x02\t..\xdd\xc7\x11,v1603":"r\xff6...Ǯ,v1861"
15:46:50.604767 version@stat F·[2] S·219KiB[219KiB] Sc·[0.50]
15:46:50.607226 db@janitor F·4 G·0
15:46:50.607265 db@open done T·5.472512ms
15:55:07.978365 db@close closing
15:55:07.978473 db@close done T·107.39µs```

Thanks

@NicoDFS
Copy link
Author

NicoDFS commented Sep 15, 2022

@laviniat1996 That seems to have worked, thank you. I question please before we close this. In a real world depolyment when a network upgrades from POA to POS by running ibft switch command can the --epoch-size flag be used? Thank you

@ivanbozic21 ivanbozic21 added the investigating This behavior is still being tested out label Sep 15, 2022
@laviniat1996
Copy link
Contributor

Hi @NicoDFS , I have deleted my previous comment because it contained some bad information. I have been able to reproduce the issue. This is an actual bug and we are looking into it. Every third epoch, the block production seems to stop. Recreating the genesis file with the epoch flag removed solved the issue because the default epoch value is 100000, so the chain didn't reach that value.

In the real world deployment, recreating the genesis file is not an option, and the --epoch-size can be added only to genesis, so not when you make the PoA->PoS switch. Also, this flag shouldn't affect the block production in PoA,

Thank you so much for letting us know about this issue! We will give you updates about the progress regarding this.

@NicoDFS
Copy link
Author

NicoDFS commented Sep 15, 2022

Thanks @laviniat1996

@laviniat1996 laviniat1996 added bug Something isn't working in the pipeline Logged into our issue tracking pipeline labels Sep 16, 2022
@ivanbozic21 ivanbozic21 reopened this Oct 3, 2022
@zivkovicmilos zivkovicmilos removed the bug Something isn't working label Oct 7, 2022
@ivanbozic21
Copy link

@NicoDFS apology for the late response, and we will have an update for you sometimes next week.

@zivkovicmilos zivkovicmilos added bug Something isn't working and removed investigating This behavior is still being tested out labels Oct 11, 2022
@zivkovicmilos zivkovicmilos linked a pull request Oct 11, 2022 that will close this issue
11 tasks
@zivkovicmilos
Copy link
Contributor

Hey @NicoDFS,

Apologies for the long wait on this fix rollout - I've opened up a PR that resolves this problem, and linked it to this issue.

Thank you for alerting us about it 🙏

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working in the pipeline Logged into our issue tracking pipeline
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants