Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize access to block header #6143

Merged
merged 9 commits into from
Nov 14, 2023

Conversation

matthew1001
Copy link
Contributor

PR description

As outlined in the comment in #6140 it appears to be the case that access to the block header isn't thread safe when the block being requested is a newly added chain head.

Fixed Issue(s)

Fixes #6140

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Copy link

github-actions bot commented Nov 8, 2023

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • I thought about the changelog and included a changelog update if required.
  • If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

@garyschulte
Copy link
Contributor

garyschulte commented Nov 8, 2023

Should we need to synchronize blockheader access? This isn't something that typically needs concurrency safety, except of course near-head.

Is the transaction spam using block number or blockhash? I can understand chain head not available if the tx spammer is getting ahead of itself by predicting what the blocknumber should be based on block time. But if it is blockhash, where is the block builder getting the blockhash / block number ? which one are we using in this case

Without context, this seems like an issue that should be solved by ensuring we commit and persist the blockchain rocksdb transaction rather than paying a synchronization overhead cost when trying to access block headers.

edit: Has this fixed the issue in your test case? If so, perhaps we should overload with synchonized access, that way we can have synchronization when necessary, and not pay the cost for operations that do not require it

@matthew1001
Copy link
Contributor Author

Yeah it's exactly the near-head case that the problem occurs. Basically the chain header is updated (inside a synchronized function) and then storage is updated and committed (in the same synchronized function).

But the getter for the chain head can be called at any time, and because it's not synchronized it can retrieve the new value before the storage commit has happened.

So possibly an alternative to synchronizing on getBlockHeader() would be to synchronize on getChainHeadHash() (and other related getChainHead*() methods probably) if getting the chain head is less frequent compared to getBlockHeader(). What do you think?

@matthew1001
Copy link
Contributor Author

Going to run some tests with a change to synchronizing getChainHead*, not getBlockHeader, and if it fixes the issue I'll update the PR

@matthew1001
Copy link
Contributor Author

@garyschulte I'm running my tests again using your suggestion of overloading getBlockHeader by adding getBlockHeaderSafe which is synchronized, and the caller can choose to use in the rare event that the block header for the latest block isn't available. I'll update here once I'm happy that the issue isn't showing any more. Let me know what you think of the changes.

Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me, but I think we can DRY it up a bit

// Optimistically get the chain head. getChainHeadBlockHeader() doesn't take any locks,
// which might mean that the latest block is still being committed to storage. If this
// call fails try the synchronized alternative, and if that fails give up.
BlockHeader chainHeadBlockHeader = getChainHeadBlockHeader().orElse(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we leave this as it is and have this logic directly in getChainBlockHeader()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good thought. I'll tidy it up and push a new commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New commits pushed @garyschulte. I'll mark as auto-merge once you're happy with them.

Comment on lines 563 to 567
private Optional<BlockHeader> getChainHeadBlockHeaderSafe() {
final MutableBlockchain blockchain = protocolContext.getBlockchain();
return blockchain.getBlockHeaderSafe(blockchain.getChainHeadHash());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about instead of adding here we add the safe call to the existing getChainHeadBlockHeader(), e.g.:

  private Optional<BlockHeader> getChainHeadBlockHeader() {
    final MutableBlockchain blockchain = protocolContext.getBlockchain();
    return blockchain.getBlockHeader(blockchain.getChainHeadHash())
        .or(() -> blockchain.getBlockHeaderSafe(blockchain.getChainHeadHash()));
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See latest commits

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
@matthew1001 matthew1001 enabled auto-merge (squash) November 12, 2023 13:18
@fab-10
Copy link
Contributor

fab-10 commented Nov 13, 2023

I like the approach of the *Safe version of the method, let's monitor this to see if the issue is gone, otherwise as suggested before we can implement a retry policy, in case of a transient error, like chain head not found.

@matthew1001
Copy link
Contributor Author

The current PR definitely fixes the issue - no failures after >10 hours running a single validator at 50TPS. Question is whether we stick with the refactor I did under 1641b23 or revert to having separate getChainHeadBlockHeader() and getChainHeadBlockHeaderSafe(). I don't have a strong opinion either way. With the latest commit, getChainHeadBlockHeader() will make 2 attempts to get a block header if the block being requested genuinely doesn't exist. But for any block < chain head (and most calls for the chain head block except in very occasional circumstances) this shouldn't affect performance. So I'm happy sticking with the latest refactor, but likewise happy to revert.

@garyschulte any thoughts? It would be nice to get this into 23.10.2 as it causes unpleasant behaviour if you hit it.

Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM, especially if this solves the case you are encountering.

@matthew1001 matthew1001 merged commit 331ddcf into hyperledger:main Nov 14, 2023
19 checks passed
jflo pushed a commit to jflo/besu that referenced this pull request Nov 17, 2023
* Synchronize access to block header

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Add 'safe' version of getBlockHeader

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Move retry with lock into getChainHeadBlockHeader()

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Reinstate 'final' modifier

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

---------

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Justin Florentine <justin+github@florentine.us>
jflo pushed a commit to jflo/besu that referenced this pull request Nov 20, 2023
* Synchronize access to block header

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Add 'safe' version of getBlockHeader

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Move retry with lock into getChainHeadBlockHeader()

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Reinstate 'final' modifier

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

---------

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Justin Florentine <justin+github@florentine.us>
jflo pushed a commit to jflo/besu that referenced this pull request Dec 4, 2023
* Synchronize access to block header

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Add 'safe' version of getBlockHeader

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Move retry with lock into getChainHeadBlockHeader()

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Reinstate 'final' modifier

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

---------

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Justin Florentine <justin+github@florentine.us>
matthew1001 added a commit to kaleido-io/besu that referenced this pull request Jan 12, 2024
* Synchronize access to block header

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Add 'safe' version of getBlockHeader

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Move retry with lock into getChainHeadBlockHeader()

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

* Reinstate 'final' modifier

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

---------

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Chain head block is sometimes not available on the node that actually mined the block
4 participants