-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Executor exit with unknown reason while running performance test #588
Comments
executor 被 kill 时的 dmesg 日志: |
Recommend to debug |
问题分析Postman 会将每次Executor 执行的结果缓存在 Backlogs 的 completed 结构中,然后在收到 Chain 的 RichStatus 后,将其中的缓存清空。其调用的接口为 prune: pub fn prune(&mut self, height: u64) {
// Importance guard: we must keep the executed result of the recent
// 2 height(current_height - 1, current_height - 2), which used when
// postman check arrived proof via `Postman::check_proof`
if height + 2 < self.get_current_height() {
self.completed = self.completed.split_off(&height);
}
} 为了使 Executor 能正常验证交易,缓存至少需要保存 [current_height, current_height - 1, current_height - 2] 3 个块的执行结果。 然而以上的代码逻辑并不能满足这个要求:
更改方案pub fn prune(&mut self, height: u64) {
// Importance guard: we must keep the executed result of the recent
// 3 height(current_height, current_height - 1, current_height - 2),
// which used when postman check arrived proof via `Postman::check_proof`
if self.get_current_height() > 2 {
let split_height = min(height, self.get_current_height() - 2);
self.completed = self.completed.split_off(&split_height);
}
} 测试用例#[test]
fn test_update_rich_status_0() {
let mut postman = helpers::generate_postman(5, Default::default());
backlogs_prepare(&mut postman);
let mut rich_status = RichStatus::new();
rich_status.set_height(0);
postman.update_by_rich_status(&rich_status);
// block height in rich_status is from Chain, that means database's block height.
// block height in postman, that means executed block height which cache in Executor.
// Testcase 0: block_height[chain] = 0, block_height[executor] = 5.
// Expected: remove the block 0 from cache, but not other block.
assert!(postman.backlogs.get_completed_result(0).is_none());
assert!(postman.backlogs.get_completed_result(1).is_some());
assert!(postman.backlogs.get_completed_result(2).is_some());
assert!(postman.backlogs.get_completed_result(3).is_some());
assert!(postman.backlogs.get_completed_result(4).is_some());
assert!(postman.backlogs.get_completed_result(5).is_some());
}
#[test]
fn test_update_rich_status_1() {
let mut postman = helpers::generate_postman(5, Default::default());
backlogs_prepare(&mut postman);
let mut rich_status = RichStatus::new();
rich_status.set_height(1);
postman.update_by_rich_status(&rich_status);
// block height in rich_status is from Chain, that means database's block height.
// block height in postman, that means executed block height which cache in Executor.
// Testcase 0: block_height[chain] = 1, block_height[executor] = 5.
// Expected: remove the block 0, 1 from cache, but not other block.
assert!(postman.backlogs.get_completed_result(0).is_none());
assert!(postman.backlogs.get_completed_result(1).is_none());
assert!(postman.backlogs.get_completed_result(2).is_some());
assert!(postman.backlogs.get_completed_result(3).is_some());
assert!(postman.backlogs.get_completed_result(4).is_some());
assert!(postman.backlogs.get_completed_result(5).is_some());
}
#[test]
fn test_update_rich_status_2() {
let mut postman = helpers::generate_postman(5, Default::default());
backlogs_prepare(&mut postman);
let mut rich_status = RichStatus::new();
rich_status.set_height(2);
postman.update_by_rich_status(&rich_status);
// block height in rich_status is from Chain, that means database's block height.
// block height in postman, that means executed block height which cache in Executor.
// Testcase 0: block_height[chain] = 2, block_height[executor] = 5.
// Expected: remove the block 0, 1, 2 from cache, but not other block.
assert!(postman.backlogs.get_completed_result(0).is_none());
assert!(postman.backlogs.get_completed_result(1).is_none());
assert!(postman.backlogs.get_completed_result(2).is_none());
assert!(postman.backlogs.get_completed_result(3).is_some());
assert!(postman.backlogs.get_completed_result(4).is_some());
assert!(postman.backlogs.get_completed_result(5).is_some());
}
#[test]
fn test_update_rich_status_3() {
let mut postman = helpers::generate_postman(5, Default::default());
backlogs_prepare(&mut postman);
let mut rich_status = RichStatus::new();
rich_status.set_height(3);
postman.update_by_rich_status(&rich_status);
// block height in rich_status is from Chain, that means database's block height.
// block height in postman, that means executed block height which cache in Executor.
// Testcase 0: block_height[chain] = 3, block_height[executor] = 5.
// Expected: postman needs to keep at least 3 block in cache, so remove the block 0, 1, 2 from cache, but not other block.
assert!(postman.backlogs.get_completed_result(0).is_none());
assert!(postman.backlogs.get_completed_result(1).is_none());
assert!(postman.backlogs.get_completed_result(2).is_none());
assert!(postman.backlogs.get_completed_result(3).is_some());
assert!(postman.backlogs.get_completed_result(4).is_some());
assert!(postman.backlogs.get_completed_result(5).is_some());
}
#[test]
fn test_update_rich_status_4() {
let mut postman = helpers::generate_postman(5, Default::default());
backlogs_prepare(&mut postman);
let mut rich_status = RichStatus::new();
rich_status.set_height(4);
postman.update_by_rich_status(&rich_status);
// block height in rich_status is from Chain, that means database's block height.
// block height in postman, that means executed block height which cache in Executor.
// Testcase 0: block_height[chain] = 3, block_height[executor] = 5.
// Expected: postman needs to keep at least 3 block in cache, so remove the block 0, 1, 2 from cache, but not other block.
assert!(postman.backlogs.get_completed_result(0).is_none());
assert!(postman.backlogs.get_completed_result(1).is_none());
assert!(postman.backlogs.get_completed_result(2).is_none());
assert!(postman.backlogs.get_completed_result(3).is_some());
assert!(postman.backlogs.get_completed_result(4).is_some());
assert!(postman.backlogs.get_completed_result(5).is_some());
}
#[test]
fn test_update_rich_status_5() {
let mut postman = helpers::generate_postman(5, Default::default());
backlogs_prepare(&mut postman);
let mut rich_status = RichStatus::new();
rich_status.set_height(5);
postman.update_by_rich_status(&rich_status);
// block height in rich_status is from Chain, that means database's block height.
// block height in postman, that means executed block height which cache in Executor.
// Testcase 0: block_height[chain] = 3, block_height[executor] = 5.
// Expected: postman needs to keep at least 3 block in cache, so remove the block 0, 1, 2 from cache, but not other block.
assert!(postman.backlogs.get_completed_result(0).is_none());
assert!(postman.backlogs.get_completed_result(1).is_none());
assert!(postman.backlogs.get_completed_result(2).is_none());
assert!(postman.backlogs.get_completed_result(3).is_some());
assert!(postman.backlogs.get_completed_result(4).is_some());
assert!(postman.backlogs.get_completed_result(5).is_some());
}
fn backlogs_prepare(postman: &mut Postman) {
let execute_result_0 = generate_executed_result(0);
let execute_result_1 = generate_executed_result(1);
let execute_result_2 = generate_executed_result(2);
let execute_result_3 = generate_executed_result(3);
let execute_result_4 = generate_executed_result(4);
let execute_result_5 = generate_executed_result(5);
postman
.backlogs
.insert_completed_result(0, execute_result_0);
postman
.backlogs
.insert_completed_result(1, execute_result_1);
postman
.backlogs
.insert_completed_result(2, execute_result_2);
postman
.backlogs
.insert_completed_result(3, execute_result_3);
postman
.backlogs
.insert_completed_result(4, execute_result_4);
postman
.backlogs
.insert_completed_result(5, execute_result_5);
} |
"height + 2 < self.get_current_height() " 是 self.get_current_height() = height + 3 永远不成立? |
链如果正常执行,heigth (下一块高) 应该是 (当前高度 + 1) ,所以原来的条件: if height + 2 < self.get_current_height() {} 是不成立的,也就是说在链正常运行的时候,缓存是没法清理的。 在非常特殊的情况,在链被意外删除后,才会出现条件 |
ok,刚刚理解错了。 |
Description
Executor exit with unknown reason while running performance test
Steps to Reproduce
Run CI performance test
Expected behavior: [What you expect to happen]
Everything Ok.
Actual behavior: [What actually happens]
Exit with unknown reason.
cita-executor.log:
Reproduce how often: [What percentage of the time does it reproduce?]
High probability.
Versions
Develop.
The text was updated successfully, but these errors were encountered: