Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync Ropsten test net stuck at block 12572551 #4084

Closed
ronaldjmaas opened this issue Jul 12, 2022 · 20 comments
Closed

Sync Ropsten test net stuck at block 12572551 #4084

ronaldjmaas opened this issue Jul 12, 2022 · 20 comments
Labels
TeamChupa GH issues worked on by Chupacabara Team

Comments

@ronaldjmaas
Copy link

Description

Besu: 22.4.4
Lighthouse BN: 2.3.2-rc0
Both clients run on Debian 11.4.
No Docker or VM
Restarting Besu did not resolve the issue

Stack trace:
Jul 11 09:57:48 xeon taskset[11653]: 2022-07-11 09:57:48.376-07:00 | vert.x-worker-thread-0 | INFO | EngineForkchoiceUpdated | Consensus fork-choice-update: head: 0x0ec70021cb7ee7d9ef7b63cf8a0496a8960d3ee41ffff14a0167816756f9fc8c, finalized: 0x21805ada950c787ca8edbdb7fe4ff4afd9f347baaaff8a091ef5d736001ee712, safeBlockHash: 0x0ec70021cb7ee7d9ef7b63cf8a0496a8960d3ee41ffff14a0167816756f9fc8c
Jul 11 09:58:00 xeon taskset[11653]: 2022-07-11 09:58:00.527-07:00 | vert.x-worker-thread-0 | INFO | EngineForkchoiceUpdated | Consensus fork-choice-update: head: 0x5024aa168c378a0424c9db476b89e72ef7183d3085e504f7e5f5fa9cc956e770, finalized: 0x21805ada950c787ca8edbdb7fe4ff4afd9f347baaaff8a091ef5d736001ee712, safeBlockHash: 0x5024aa168c378a0424c9db476b89e72ef7183d3085e504f7e5f5fa9cc956e770
Jul 11 09:58:08 xeon taskset[11653]: 2022-07-11 09:58:08.519-07:00 | Timer-0 | INFO | DNSTimerTask | Refreshing DNS records with enrtree://AKA3AM6LPBYEUDMVNU3BSVQJ5AD45Y7YPOHJLEF6W26QOE4VTUDPE@all.ropsten.ethdisco.net
Jul 11 09:58:13 xeon taskset[11653]: 2022-07-11 09:58:13.903-07:00 | Timer-0 | INFO | DNSResolver | Resolved 219 nodes
Jul 11 09:58:24 xeon taskset[11653]: java.lang.RuntimeException: java.lang.IllegalStateException: Address=0xbc07d8635ae85e1282dc9579867ed5eb88cd838d Prior Value in Rolling Change: nonces differ
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateArchive.getMutable(BonsaiWorldStateArchive.java:254)
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:100)
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.consensus.merge.blockcreation.MergeCoordinator.validateBlock(MergeCoordinator.java:256)
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.consensus.merge.blockcreation.MergeCoordinator.rememberBlock(MergeCoordinator.java:277)
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.consensus.merge.blockcreation.TransitionCoordinator.rememberBlock(TransitionCoordinator.java:137)
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.ethereum.api.jsonrpc.internal.methods.engine.EngineNewPayload.syncResponse(EngineNewPayload.java:180)
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.ethereum.api.jsonrpc.internal.methods.ExecutionEngineJsonRpcMethod.lambda$response$0(ExecutionEngineJsonRpcMethod.java:62)
Jul 11 09:58:24 xeon taskset[11653]: at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:159)
Jul 11 09:58:24 xeon taskset[11653]: at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100)
Jul 11 09:58:24 xeon taskset[11653]: at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:157)
Jul 11 09:58:24 xeon taskset[11653]: at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
Jul 11 09:58:24 xeon taskset[11653]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
Jul 11 09:58:24 xeon taskset[11653]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
Jul 11 09:58:24 xeon taskset[11653]: at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
Jul 11 09:58:24 xeon taskset[11653]: at java.base/java.lang.Thread.run(Thread.java:833)
Jul 11 09:58:24 xeon taskset[11653]: Caused by: java.lang.IllegalStateException: Address=0xbc07d8635ae85e1282dc9579867ed5eb88cd838d Prior Value in Rolling Change: nonces differ
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.ethereum.bonsai.BonsaiAccount.assertCloseEnoughForDiffing(BonsaiAccount.java:303)
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateUpdater.rollAccountChange(BonsaiWorldStateUpdater.java:497)
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateUpdater.lambda$rollForward$9(BonsaiWorldStateUpdater.java:422)
Jul 11 09:58:24 xeon taskset[11653]: at java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1850)
Jul 11 09:58:24 xeon taskset[11653]: at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateUpdater.rollForward(BonsaiWorldStateUpdater.java:420)
Jul 11 09:58:24 xeon taskset[11653]: at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateArchive.getMutable(BonsaiWorldStateArchive.java:243)
Jul 11 09:58:24 xeon taskset[11653]: ... 14 more
Jul 11 09:58:24 xeon taskset[11653]: 2022-07-11 09:58:24.672-07:00 | vert.x-worker-thread-0 | ERROR | MainnetBlockValidator | Unable to process block because parent world state 0x0e2db064aad212af5efeeebb70510134b284694d0e403d3a9c08b3f5d01e8038 is not available. Block 12572551 (0x510af2b80837e23e43db9f76ca2062b08f6292d44963bd4689e8d4fb905f2f0e)

@SeaMonkey82
Copy link

MainnetBlockValidator | Unable to process block because parent world state 0x0e2db064aad212af5efeeebb70510134b284694d0e403d3a9c08b3f5d01e8038 is not available.

#4052

You will need to resync from scratch.

@ronaldjmaas
Copy link
Author

ronaldjmaas commented Jul 12, 2022

This is the result of a sync from scratch w/ 22.4.4. Note initially Besu / Lighthouse were able to successfully sync with head of the Ropsten chain. But after stopping for a day and then after restart of both clients. It failed to sync again.

@matkt
Copy link
Contributor

matkt commented Jul 13, 2022

do you still have the issue ? it will be nice if we can investigate your database

@ronaldjmaas
Copy link
Author

At the moment I am rerunning same scenario again using Besu + Lighthouse combo:

  1. Start Ropsten sync from scratch until successfully synced with head of chain
  2. Stop both clients for 24 hours
  3. Start both clients for 24 hours
  4. Go to step 2

Unfortunately I do not have the original database anymore. But will let you know when the issue reoccurs

@matkt
Copy link
Contributor

matkt commented Jul 13, 2022

thanks we are also trying to reproduce

@ronaldjmaas
Copy link
Author

That morning I started Besu at Jul 11 08:24:46 after being offline for about 14 hours.
Then after printing cute Pandas, Besu + Lighthouse started sync without errors until Jul 11 09:58:00

@ronaldjmaas
Copy link
Author

Just before the exception Besu was synced to head of chain. Timestamp of head 0x5024aa168c378a0424c9db476b89e72ef7183d3085e504f7e5f5fa9cc956e770 is 2022-07-11 16:58:00 UTC / 09:58:00 PDT

@iamhsk iamhsk added TeamChupa GH issues worked on by Chupacabara Team and removed TeamChupa GH issues worked on by Chupacabara Team labels Jul 14, 2022
@ronaldjmaas
Copy link
Author

This morning exact same exception happened again but now with the Sepolia test network. I captured the database. If you are still interested in it let me know how I can send it to you. Tar file is about 1.25 GB

I will continue testing once the candidate fix has been released.

@matkt
Copy link
Contributor

matkt commented Jul 17, 2022

it wasn't with the candidate fix which is not yet merge?

@ronaldjmaas
Copy link
Author

Both errors occured with 22.4.4 (without candidate fix). Once you bake a new release of Besu. I will start retesting.

Thank you

@matkt
Copy link
Contributor

matkt commented Jul 17, 2022

OK, thanks. we are currently testing this fix. If the bug is not enough, we would like to recover the database. Thank you for your help

@iamhsk iamhsk added the TeamChupa GH issues worked on by Chupacabara Team label Jul 18, 2022
@jflo
Copy link
Contributor

jflo commented Jul 19, 2022

By any chance were you syncing the beacon chain at the same time as Besu was syncing on Ropsten and/or Sepolia? I have experienced Besu getting stuck at the latest pivot block and not proceeding past it when I have my CL (teku) syncing at the same time as Besu.

@ronaldjmaas
Copy link
Author

Yes. CL (Lighthouse) and EL (Besu) were always syncing together. Post merge it is necessary to have both clients running to be able to successfully sync to head of chain. At least that is my understanding

@ajsutton
Copy link
Contributor

Both need to be running, but not syncing. In fact if the CL isn't yet in sync the EL will have a very hard time syncing. Best practice is to use checkpoint sync to sync the CL (should be in sync in a couple of minutes) and then that can keep the EL informed of the current chain head so it can download the appropriate world state as part of its sync.

@ronaldjmaas
Copy link
Author

Ok I will do that once a new release from Besu is available

@ronaldjmaas
Copy link
Author

ronaldjmaas commented Jul 20, 2022

But note in real life CL + EL will sometimes sync simultaneously. E.g. my validator is setup to restart both clients automatically after a planned reboot and also in case of a power outage. The first scenario is usually done in a few minutes. But 2nd scenario both clients may be offline for several hours or maybe even longer.

Regardless which scenario, restarting the server should never result in a corrupted database or client unable to sync eventually

@ajsutton
Copy link
Contributor

But note in real life CL + EL will sometimes sync simultaneously. E.g. my validator is setup to restart both clients automatically after a planned reboot and also in case of a power outage. The first scenario is usually done in a few minutes. But 2nd scenario both clients may be offline for several hours or maybe even longer.

That's fine - there's a very big difference between the initial sync and catching up after a restart. During the initial sync the EL (besu) typically uses either fast sync or snap sync to download the world state. In either case it needs to know what the current chain head is so it can request the state from peers (who only keep the state for blocks close to the chain head). Once it finishes that initial sync the EL has the world state and is apply to sync "in lock step" with the CL by applying each block as the CL does and it will catch up that way.

Regardless which scenario, restarting the server should never result in a corrupted database or client unable to sync eventually

Definitely. I hope I haven't added confusion to this issue - just wanted to clarify that it's normal for the EL to wind up waiting for the CL to sync (or at least progress further in its sync) because it needs the CL to tell it what the chain head is post-merge.

@ronaldjmaas
Copy link
Author

Ok that makes sense. Thanks for clarifying.

@non-fungible-nelson
Copy link
Contributor

non-fungible-nelson commented Jul 26, 2022

Hi @ronaldjmaas - have you been able to reproduce your issue on the latest release candidate (RC2)? If not, I would like to close this issue. Thank you!

@ronaldjmaas
Copy link
Author

I was not able to reproduce this issue with RC2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TeamChupa GH issues worked on by Chupacabara Team
Projects
None yet
Development

No branches or pull requests

7 participants