Fix for backward sync wrongly thinking it is done after a restart #5182

fab-10 · 2023-03-07T11:02:56Z

PR description

There is an issue when restarting Besu when a backward sync session is running, since after the restart it is possible that the Consensus client sends a FcU or a NewPayload for a block that is present in the backward sync storage, but not yet imported, so not on the main chain, but still the backward sync thinks it should not do anything with that block, so it returns like it has completed the sync, but since the sync is not done actually then the internal error that the finalize block is not present.

The solution is to persist the backward sync status, so in case of a restart, it can resume from where it was interrupted.

Fixed Issue(s)

fixes #5053

Documentation

I thought about documentation and added the doc-change-required label to this PR if
updates are required.

Acceptance Tests (Non Mainnet)

I have considered running ./gradlew acceptanceTestNonMainnet locally if my PR affects non-mainnet modules.

Changelog

I thought about the changelog and included a changelog update if required.

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

…u restart" This reverts commit e7ac9e5. Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net> # Conflicts: # ethereum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncContext.java

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

siladu · 2023-03-08T22:38:12Z

Hi @fab-10 can you add details of any testing you've done to the description please.

fab-10 · 2023-03-09T08:36:16Z

To test I have used a mainnet Lighthouse-Besu, with a long backward sync (days), that was experiencing the problem, and after applying this fix, I have restarted it many times, this issue was not reported anymore and the sync eventually finished.

Note: in case you apply this fix to an instance that is currently doing a backward sync, of course on the first start it still does not have the stored state, so it could still report the issue, so the workaround is: delete the CL data and let it checkpoint sync (it only takes seconds) and restart Besu, so the CL could send a fresh hash that is not in the backward sync storage

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

siladu

I don't know this code well enough to approve, trying to learn it hence the questions!

...eum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardChain.java

siladu · 2023-03-10T07:26:53Z

...eum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardChain.java

-        .log();
+
+    if (firstStoredAncestor.isEmpty()) {
+      updateLastStorePivot(Optional.of(blockHeader));


Why don't we add the blockHeader to the chainStorage here as well?

Does firstStoredAncestor.isEmpty represent the first BWS block that we've received?

Why don't we add the blockHeader to the chainStorage here as well?

chainStorage is the sequence of the blocks to import, each entry is blockHash -> nextBlockHash, so you know what block to import next, while the block headers are saved in another table.

Does firstStoredAncestor.isEmpty represent the first BWS block that we've received?

it is empty at the beginning of the bws, and then it is updated when going backward/forward to point to the current block, and this is one of the variables that was not stored, but is required for to resume the session

...src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncAlgorithm.java

...eum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardChain.java

matkt · 2023-03-10T16:20:10Z

tried the heal with this PR and it seems to be good

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

macfarla

looks ok to me - one minor comment

...h/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncContext.java

.../eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncStep.java

Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com> Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

…-restart

siladu

LGTM

…perledger#5182)   ## PR description There is an issue when restarting Besu when a backward sync session is running, since after the restart it is possible that the Consensus client sends a FcU or a NewPayload for a block that is present in the backward sync storage, but not yet imported, so not on the main chain, but still the backward sync thinks it should not do anything with that block, so it returns like it has completed the sync, but since the sync is not done actually then the internal error that the finalize block is not present. The solution is to persist the backward sync status, so in case of a restart, it can resume from where it was interrupted. ## Fixed Issue(s)   fixes hyperledger#5053 ## Documentation - [x] I thought about documentation and added the `doc-change-required` label to this PR if [updates are required](https://wiki.hyperledger.org/display/BESU/Documentation). ## Acceptance Tests (Non Mainnet) - [x] I have considered running `./gradlew acceptanceTestNonMainnet` locally if my PR affects non-mainnet modules. ## Changelog - [x] I thought about the changelog and included a [changelog update if required](https://wiki.hyperledger.org/display/BESU/Changelog). --------- Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net> Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>

fab-10 added 2 commits March 7, 2023 11:48

Fix for backward sync wrongly thinking it is done after a Besu restart

e7ac9e5

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

Add CHANGELOG entry

dae939e

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

fab-10 marked this pull request as draft March 7, 2023 13:28

Remember backward sync status accross restarts

c2c88b1

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

fab-10 marked this pull request as ready for review March 7, 2023 16:27

fab-10 force-pushed the bws-fix-restart branch from af2a492 to db9205f Compare March 7, 2023 16:30

Persist backward sync status to support resuming across restarts

667e82a

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

fab-10 force-pushed the bws-fix-restart branch from db9205f to 667e82a Compare March 7, 2023 16:37

Merge branch 'main' into bws-fix-restart

b21e023

non-fungible-nelson added bug Something isn't working syncing TeamChupa GH issues worked on by Chupacabara Team labels Mar 7, 2023

Fix tests

11957c1

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

fab-10 force-pushed the bws-fix-restart branch from ca53f1a to 11957c1 Compare March 7, 2023 18:11

Merge branch 'main' into bws-fix-restart

0094db4

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

siladu reviewed Mar 10, 2023

View reviewed changes

fab-10 added 2 commits March 10, 2023 17:33

Fix typo

9eea27d

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

Merge branch 'main' into bws-fix-restart

015c421

fab-10 force-pushed the bws-fix-restart branch from c29c9df to 015c421 Compare March 10, 2023 16:34

fab-10 added 2 commits March 13, 2023 11:22

Merge branch 'main' into bws-fix-restart

eb34387

Renaming and comment from code review

fc351dd

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

fab-10 requested a review from siladu March 13, 2023 10:32

macfarla reviewed Mar 15, 2023

View reviewed changes

...h/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncContext.java Outdated Show resolved Hide resolved

jframe reviewed Mar 15, 2023

View reviewed changes

.../eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncStep.java Show resolved Hide resolved

fab-10 and others added 2 commits March 15, 2023 09:50

[skip ci] fix typo

fa37c78

Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com> Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

Merge branch 'main' into bws-fix-restart

cd8bf03

Merge branch 'bws-fix-restart' of github.com:fab-10/besu into bws-fix…

d090e17

…-restart

siladu approved these changes Mar 15, 2023

View reviewed changes

fab-10 added this pull request to the merge queue Mar 15, 2023

Merged via the queue into hyperledger:main with commit c0c329f Mar 15, 2023

fab-10 deleted the bws-fix-restart branch March 15, 2023 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for backward sync wrongly thinking it is done after a restart #5182

Fix for backward sync wrongly thinking it is done after a restart #5182

fab-10 commented Mar 7, 2023 •

edited

Loading

siladu commented Mar 8, 2023

fab-10 commented Mar 9, 2023

siladu left a comment

siladu Mar 10, 2023

fab-10 Mar 10, 2023

matkt commented Mar 10, 2023

macfarla left a comment

siladu left a comment

Fix for backward sync wrongly thinking it is done after a restart #5182

Fix for backward sync wrongly thinking it is done after a restart #5182

Conversation

fab-10 commented Mar 7, 2023 • edited Loading

PR description

Fixed Issue(s)

Documentation

Acceptance Tests (Non Mainnet)

Changelog

siladu commented Mar 8, 2023

fab-10 commented Mar 9, 2023

siladu left a comment

Choose a reason for hiding this comment

siladu Mar 10, 2023

Choose a reason for hiding this comment

fab-10 Mar 10, 2023

Choose a reason for hiding this comment

matkt commented Mar 10, 2023

macfarla left a comment

Choose a reason for hiding this comment

siladu left a comment

Choose a reason for hiding this comment

fab-10 commented Mar 7, 2023 •

edited

Loading