Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for backward sync wrongly thinking it is done after a restart #5182

Merged
merged 15 commits into from
Mar 15, 2023

Conversation

fab-10
Copy link
Contributor

@fab-10 fab-10 commented Mar 7, 2023

PR description

There is an issue when restarting Besu when a backward sync session is running, since after the restart it is possible that the Consensus client sends a FcU or a NewPayload for a block that is present in the backward sync storage, but not yet imported, so not on the main chain, but still the backward sync thinks it should not do anything with that block, so it returns like it has completed the sync, but since the sync is not done actually then the internal error that the finalize block is not present.

The solution is to persist the backward sync status, so in case of a restart, it can resume from where it was interrupted.

Fixed Issue(s)

fixes #5053

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if
    updates are required.

Acceptance Tests (Non Mainnet)

  • I have considered running ./gradlew acceptanceTestNonMainnet locally if my PR affects non-mainnet modules.

Changelog

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
@fab-10 fab-10 marked this pull request as draft March 7, 2023 13:28
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
@fab-10 fab-10 marked this pull request as ready for review March 7, 2023 16:27
…u restart"

This reverts commit e7ac9e5.

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>

# Conflicts:
#	ethereum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncContext.java
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
@non-fungible-nelson non-fungible-nelson added bug Something isn't working syncing TeamChupa GH issues worked on by Chupacabara Team labels Mar 7, 2023
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
@siladu
Copy link
Contributor

siladu commented Mar 8, 2023

Hi @fab-10 can you add details of any testing you've done to the description please.

@fab-10
Copy link
Contributor Author

fab-10 commented Mar 9, 2023

To test I have used a mainnet Lighthouse-Besu, with a long backward sync (days), that was experiencing the problem, and after applying this fix, I have restarted it many times, this issue was not reported anymore and the sync eventually finished.

Note: in case you apply this fix to an instance that is currently doing a backward sync, of course on the first start it still does not have the stored state, so it could still report the issue, so the workaround is: delete the CL data and let it checkpoint sync (it only takes seconds) and restart Besu, so the CL could send a fresh hash that is not in the backward sync storage

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this code well enough to approve, trying to learn it hence the questions!

.log();

if (firstStoredAncestor.isEmpty()) {
updateLastStorePivot(Optional.of(blockHeader));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we add the blockHeader to the chainStorage here as well?

Does firstStoredAncestor.isEmpty represent the first BWS block that we've received?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we add the blockHeader to the chainStorage here as well?

chainStorage is the sequence of the blocks to import, each entry is blockHash -> nextBlockHash, so you know what block to import next, while the block headers are saved in another table.

Does firstStoredAncestor.isEmpty represent the first BWS block that we've received?

it is empty at the beginning of the bws, and then it is updated when going backward/forward to point to the current block, and this is one of the variables that was not stored, but is required for to resume the session

@matkt
Copy link
Contributor

matkt commented Mar 10, 2023

tried the heal with this PR and it seems to be good

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
@fab-10 fab-10 requested a review from siladu March 13, 2023 10:32
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok to me - one minor comment

fab-10 and others added 2 commits March 15, 2023 09:50
Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fab-10 fab-10 added this pull request to the merge queue Mar 15, 2023
Merged via the queue into hyperledger:main with commit c0c329f Mar 15, 2023
@fab-10 fab-10 deleted the bws-fix-restart branch March 15, 2023 11:28
elenduuche pushed a commit to elenduuche/besu that referenced this pull request Aug 16, 2023
…perledger#5182)

<!-- Thanks for sending a pull request! Please check out our
contribution guidelines: -->
<!-- https://github.com/hyperledger/besu/blob/main/CONTRIBUTING.md -->

## PR description

There is an issue when restarting Besu when a backward sync session is
running, since after the restart it is possible that the Consensus
client sends a FcU or a NewPayload for a block that is present in the
backward sync storage, but not yet imported, so not on the main chain,
but still the backward sync thinks it should not do anything with that
block, so it returns like it has completed the sync, but since the sync
is not done actually then the internal error that the finalize block is
not present.

The solution is to persist the backward sync status, so in case of a
restart, it can resume from where it was interrupted.

## Fixed Issue(s)
<!-- Please link to fixed issue(s) here using format: fixes #<issue
number> -->
<!-- Example: "fixes #2" -->

fixes hyperledger#5053 

## Documentation

- [x] I thought about documentation and added the `doc-change-required`
label to this PR if
[updates are
required](https://wiki.hyperledger.org/display/BESU/Documentation).

## Acceptance Tests (Non Mainnet)

- [x] I have considered running `./gradlew acceptanceTestNonMainnet`
locally if my PR affects non-mainnet modules.

## Changelog

- [x] I thought about the changelog and included a [changelog update if
required](https://wiki.hyperledger.org/display/BESU/Changelog).

---------

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
eum602 pushed a commit to lacchain/besu that referenced this pull request Nov 3, 2023
…perledger#5182)

<!-- Thanks for sending a pull request! Please check out our
contribution guidelines: -->
<!-- https://github.com/hyperledger/besu/blob/main/CONTRIBUTING.md -->

## PR description

There is an issue when restarting Besu when a backward sync session is
running, since after the restart it is possible that the Consensus
client sends a FcU or a NewPayload for a block that is present in the
backward sync storage, but not yet imported, so not on the main chain,
but still the backward sync thinks it should not do anything with that
block, so it returns like it has completed the sync, but since the sync
is not done actually then the internal error that the finalize block is
not present.

The solution is to persist the backward sync status, so in case of a
restart, it can resume from where it was interrupted.

## Fixed Issue(s)
<!-- Please link to fixed issue(s) here using format: fixes #<issue
number> -->
<!-- Example: "fixes #2" -->

fixes hyperledger#5053 

## Documentation

- [x] I thought about documentation and added the `doc-change-required`
label to this PR if
[updates are
required](https://wiki.hyperledger.org/display/BESU/Documentation).

## Acceptance Tests (Non Mainnet)

- [x] I have considered running `./gradlew acceptanceTestNonMainnet`
locally if my PR affects non-mainnet modules.

## Changelog

- [x] I thought about the changelog and included a [changelog update if
required](https://wiki.hyperledger.org/display/BESU/Changelog).

---------

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working syncing TeamChupa GH issues worked on by Chupacabara Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Internal error, backward sync completed but failed to import finalized block
6 participants