Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for ERC20 load test freeze #196

Merged
merged 1 commit into from
Apr 17, 2024
Merged

Conversation

oliverbundalo
Copy link

Description

For long running sequences (for instance sequence took 13 seconds during troubleshooting) there is a possibility of deadlock during stale check, It happens following:

  1. Sequence starts
    2024-04-16T12:40:30.538+0200 [INFO] polygon.server.polybft: sequence started: height=15

  2. Polybft gets syncerBlockCh event in informing that sequence/block is already inserted
    2024-04-16T12:40:38.731+0200 [INFO] polygon.blockchain: block already inserted: block=15 source=consensus

  3. Polybft starts stop sequence, stop sequence will last 5 seconds
    2024-04-16T12:40:43.450+0200 [INFO] polygon.server.polybft: sequence done: height=15
    2024-04-16T12:40:43.450+0200 [INFO] polygon.server.polybft: canceled sequence: sequence=15

  4. During sequence stopping 2 stale checks will occur in stale checker, 2nd will result with deadlock, because polybft thread is stuck in stopping sequence and it is not available to process sequenceShouldStop events, hence 2nd event results with infinite wait in stale checker thread.
    2024-04-16T12:40:39.506+0200 [INFO] polygon.server.polybft: [staleSequenceCheck] checking for stale sequence
    2024-04-16T12:40:39.506+0200 [INFO] polygon.server.polybft: [staleSequenceCheck] stale sequence detected: height=15 currentSequence=15
    2024-04-16T12:40:42.492+0200 [INFO] polygon.server.polybft: [staleSequenceCheck] checking for stale sequence
    2024-04-16T12:40:42.500+0200 [INFO] polygon.server.polybft: [staleSequenceCheck] stale sequence detected: height=15 currentSequence=15

  5. When polybft finishes sequence stop it proceeds to staleChecker.stopChecking() and waits for stale checker to gracefully closes, but stale checker waits for polybft processing of sequenceShouldStop events and both threads end in deadlock.

Fix is introduced to send sequenceShouldStop event only once, after which stalecheker timer will be stopped, One event is enough anyway and there is no need for additional events, Solution is checked with running of load test 15 times and all runs passed successfully.

Changes include

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)

Breaking changes

Please complete this section if any breaking changes have been made, otherwise delete it

Checklist

  • I have assigned this PR to myself
  • I have added at least 1 reviewer
  • I have added the relevant labels
  • I have updated the official documentation
  • I have added sufficient documentation in code

Testing

  • I have tested this code with the official test suite
  • I have tested this code manually

Manual tests

Please complete this section if you ran manual tests for this functionality, otherwise delete it

Documentation update

Please link the documentation update PR in this section if it's present, otherwise delete it

Additional comments

Please post additional comments in this section if you have them, otherwise delete it

@oliverbundalo oliverbundalo merged commit ebee469 into develop Apr 17, 2024
10 checks passed
@oliverbundalo oliverbundalo deleted the bug/erc20-freeze-fix branch April 17, 2024 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix Fix for a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants