Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
For long running sequences (for instance sequence took 13 seconds during troubleshooting) there is a possibility of deadlock during stale check, It happens following:
Sequence starts
2024-04-16T12:40:30.538+0200 [INFO] polygon.server.polybft: sequence started: height=15
Polybft gets syncerBlockCh event in informing that sequence/block is already inserted
2024-04-16T12:40:38.731+0200 [INFO] polygon.blockchain: block already inserted: block=15 source=consensus
Polybft starts stop sequence, stop sequence will last 5 seconds
2024-04-16T12:40:43.450+0200 [INFO] polygon.server.polybft: sequence done: height=15
2024-04-16T12:40:43.450+0200 [INFO] polygon.server.polybft: canceled sequence: sequence=15
During sequence stopping 2 stale checks will occur in stale checker, 2nd will result with deadlock, because polybft thread is stuck in stopping sequence and it is not available to process sequenceShouldStop events, hence 2nd event results with infinite wait in stale checker thread.
2024-04-16T12:40:39.506+0200 [INFO] polygon.server.polybft: [staleSequenceCheck] checking for stale sequence
2024-04-16T12:40:39.506+0200 [INFO] polygon.server.polybft: [staleSequenceCheck] stale sequence detected: height=15 currentSequence=15
2024-04-16T12:40:42.492+0200 [INFO] polygon.server.polybft: [staleSequenceCheck] checking for stale sequence
2024-04-16T12:40:42.500+0200 [INFO] polygon.server.polybft: [staleSequenceCheck] stale sequence detected: height=15 currentSequence=15
When polybft finishes sequence stop it proceeds to staleChecker.stopChecking() and waits for stale checker to gracefully closes, but stale checker waits for polybft processing of sequenceShouldStop events and both threads end in deadlock.
Fix is introduced to send sequenceShouldStop event only once, after which stalecheker timer will be stopped, One event is enough anyway and there is no need for additional events, Solution is checked with running of load test 15 times and all runs passed successfully.
Changes include
Breaking changes
Please complete this section if any breaking changes have been made, otherwise delete it
Checklist
Testing
Manual tests
Please complete this section if you ran manual tests for this functionality, otherwise delete it
Documentation update
Please link the documentation update PR in this section if it's present, otherwise delete it
Additional comments
Please post additional comments in this section if you have them, otherwise delete it