Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Execution] Fix stop control #5327

Merged
merged 1 commit into from
Feb 1, 2024
Merged

[Execution] Fix stop control #5327

merged 1 commit into from
Feb 1, 2024

Conversation

zhangchiqing
Copy link
Member

This PR fixes the stop control when ingestion engine is about to execute next block. See details in the comments of the PR.

@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (adbf9d0) 55.70% compared to head (c21cda3) 55.70%.

Files Patch % Lines
engine/execution/ingestion/engine.go 44.44% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5327   +/-   ##
=======================================
  Coverage   55.70%   55.70%           
=======================================
  Files        1004     1003    -1     
  Lines       97080    97017   -63     
=======================================
- Hits        54082    54048   -34     
+ Misses      38929    38899   -30     
- Partials     4069     4070    +1     
Flag Coverage Δ
unittests 55.70% <44.44%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhangchiqing zhangchiqing marked this pull request as ready for review January 30, 2024 20:27
@zhangchiqing zhangchiqing requested review from janezpodhostnik, peterargue and sideninja and removed request for ramtinms January 30, 2024 20:27
Copy link
Contributor

@peterargue peterargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code looks OK. given this has been working, I'm assuming this is a corner case. can you document this some more either in the comments or in this PR

@@ -205,7 +205,8 @@ func (e *Engine) BlockProcessable(b *flow.Header, _ *flow.QuorumCertificate) {

// TODO: this should not be blocking: https://github.com/onflow/flow-go/issues/4400

// skip if stopControl tells to skip
// skip if stopControl tells to skip, so that we can avoid fetching collections
// for this block
if !e.stopControl.ShouldExecuteBlock(b) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, this is the only place we check.

@@ -363,6 +364,12 @@ func (e *Engine) executeBlock(
ctx context.Context,
executableBlock *entity.ExecutableBlock,
) {

// don't execute the block if the stop control says no
if !e.stopControl.ShouldExecuteBlock(executableBlock.Block.Header) {
Copy link
Member Author

@zhangchiqing zhangchiqing Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop control should check again here, because otherwise there is an edge case that might execute blocks above the stop height. See the comment on L457

The edge case is that, when a block becomes processable, the stop height might not be set, so it's added to the queue, however, when it becomes executable and is able to execute, the stop height is set, and should not be executed. Without this check, it will be executed, but should not.

@@ -454,8 +463,6 @@ func (e *Engine) executeBlock(
e.executionDataPruner.NotifyFulfilledHeight(executableBlock.Height())
}

e.stopControl.OnBlockExecuted(executableBlock.Block.Header)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopControl.OnBlockExecuted is to report a block has been executed and triggers stop control to check whether it should stop/crash or not.

However, this is currently called after e.onBlockExecuted is called, which checks if the child block (next block) can be executed and executes if completed. So imaging the following case:

Given two blocks: A <- B and we set stop height as B, meaning stop as soon as A is executed, and don't execute B. OK, now both block A and B's collections have been received, after A is executed, B becomes ready to be executed, if onBlockExecued(A) is called before stopControl.OnBlockExecuted, then it will execute block B in another go routine (which shouldn't). But the reason we never ran into this, is probably because before B is executed, stop.OnBlockExecuted might have been called and crashed the node. However, B technically still have a chance to be executed, I didn't notice this until looking into a flakey test TestStopAtHeightWhenExecutedBeforeFinalized which captured this edge case.

So my solution here is to 1) move this line up before the onBlockExecuted, so that we let stop control check stop condition first. And 2) add ShouldExecuteBlock into the executeBlock function, so that even if stop control decides not to crash, and the function onBlockExecuted found the next block has received its collection and can be executed, and we can still stop the next block from being executed.

@zhangchiqing
Copy link
Member Author

@peterargue sorry, I just realized my comments in the PR was in "pending". I just published them.

@zhangchiqing zhangchiqing added this pull request to the merge queue Feb 1, 2024
Merged via the queue into master with commit f1f4ec5 Feb 1, 2024
50 of 51 checks passed
@zhangchiqing zhangchiqing deleted the leo/fix-stop-control branch February 1, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants