Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic after snapshot is aborted #11618

Closed
gengliqi opened this issue Dec 9, 2021 · 5 comments
Closed

Panic after snapshot is aborted #11618

gengliqi opened this issue Dec 9, 2021 · 5 comments
Assignees
Labels
affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. severity/major type/bug The issue is confirmed as a bug.

Comments

@gengliqi
Copy link
Member

gengliqi commented Dec 9, 2021

Affected version

v5.0.0-v5.0.5, v5.1.0-v5.1.3, v5.2.0-v5.2.3

Bug Report

[2021/12/03 10:54:18.660 +08:00] [INFO] [scheduler.rs:458] ["get snapshot failed"] [err="Error(Request(message: \"region 97957472 is missing\" region_not_found { region_id: 97957472 }))"] [cid=15770212]
[2021/12/03 10:54:39.170 +08:00] [INFO] [peer.rs:259] ["replicate peer"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:54:39.170 +08:00] [INFO] [raft.rs:2609] ["switched to configuration"] [config="Configuration { voters: Configuration { incoming: Configuration { voters: {} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }"] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:54:39.170 +08:00] [INFO] [raft.rs:1092] ["became follower at term 0"] [term=0] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:54:39.170 +08:00] [INFO] [raft.rs:384] [newRaft] [peers="Configuration { incoming: Configuration { voters: {} }, outgoing: Configuration { voters: {} } }"] ["last term"=0] ["last index"=0] [applied=0] [commit=0] [term=0] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:54:39.170 +08:00] [INFO] [raw_node.rs:315] ["RawNode created with id 97957474."] [id=97957474] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:54:39.872 +08:00] [INFO] [raft.rs:1336] ["received a message with higher term from 97957473"] ["msg type"=MsgHeartbeat] [message_term=6] [term=0] [from=97957473] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:54:39.872 +08:00] [INFO] [raft.rs:1092] ["became follower at term 6"] [term=6] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:54:41.915 +08:00] [INFO] [snap.rs:247] ["saving snapshot file"] [file=/tidb2/tidb/tikv-20162/tidb-data/snap/rev_97957472_6_331_(default|lock|write).sst] [snap_key=97957472_6_331]
[2021/12/03 10:54:43.503 +08:00] [INFO] [raft_log.rs:614] ["log [committed=0, persisted=0, applied=0, unstable.offset=1, unstable.entries.len()=0] starts to restore snapshot [index: 331, term: 6]"] [snapshot_term=6] [snapshot_index=331] [log="committed=0, persisted=0, applied=0, unstable.offset=1, unstable.entries.len()=0"] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:54:43.507 +08:00] [INFO] [raft.rs:2609] ["switched to configuration"] [config="Configuration { voters: Configuration { incoming: Configuration { voters: {97957473, 97957474, 97957475} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }"] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:54:43.507 +08:00] [INFO] [raft.rs:2593] ["restored snapshot"] [snapshot_term=6] [snapshot_index=331] [last_term=6] [last_index=331] [commit=331] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:54:43.507 +08:00] [INFO] [raft.rs:2474] ["[commit: 331, term: 6] restored snapshot [index: 331, term: 6]"] [snapshot_term=6] [snapshot_index=331] [commit=331] [term=6] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:54:43.507 +08:00] [INFO] [peer_storage.rs:1244] ["begin to apply snapshot"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:54:43.507 +08:00] [INFO] [peer_storage.rs:1287] ["apply snapshot with state ok"] [state="applied_index: 331 truncated_state { index: 331 term: 6 }"] [region="id: 97957472 start_key: 748000000000000FFF645F698000000000FF00000C0419A8D482FF1F00000001396665FF6636363563FF2D37FF6634662D3464FF62FF302D623838322DFFFF3762666166373031FFFF35613865000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0164323832333733FF31FF2D346333372DFF3435FF62372D3935FF63612DFF35616135FF32633335FF363337FF6500000000FB0380FF00000036E44E0000FE end_key: 748000000000000FFF645F698000000000FF00000C0419A8EC81FFD500000001386236FF6536343332FF2D63FF3630332D3432FF35FF382D396332312DFFFF3033396562333233FFFF61613536000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0139646262313739FF38FF2D313839392DFF3435FF39382D3866FF32652DFF31373637FF32653937FF383831FF6500000000FB0380FF0000003D6CAF5C00FE region_epoch { conf_ver: 1667 version: 9985 } peers { id: 97957473 store_id: 80511562 } peers { id: 97957474 store_id: 80512891 } peers { id: 97957475 store_id: 97823516 }"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:54:44.352 +08:00] [INFO] [peer.rs:3171] ["snapshot is applied"] [region="id: 97957472 start_key: 748000000000000FFF645F698000000000FF00000C0419A8D482FF1F00000001396665FF6636363563FF2D37FF6634662D3464FF62FF302D623838322DFFFF3762666166373031FFFF35613865000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0164323832333733FF31FF2D346333372DFF3435FF62372D3935FF63612DFF35616135FF32633335FF363337FF6500000000FB0380FF00000036E44E0000FE end_key: 748000000000000FFF645F698000000000FF00000C0419A8EC81FFD500000001386236FF6536343332FF2D63FF3630332D3432FF35FF382D396332312DFFFF3033396562333233FFFF61613536000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0139646262313739FF38FF2D313839392DFF3435FF39382D3866FF32652DFF31373637FF32653937FF383831FF6500000000FB0380FF0000003D6CAF5C00FE region_epoch { conf_ver: 1667 version: 9985 } peers { id: 97957473 store_id: 80511562 } peers { id: 97957474 store_id: 80512891 } peers { id: 97957475 store_id: 97823516 }"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:54:47.610 +08:00] [INFO] [region.rs:328] ["begin apply snap data"] [region_id=97957472]
[2021/12/03 10:54:49.501 +08:00] [INFO] [region.rs:401] ["apply new data"] [time_takes=185.850339ms] [region_id=97957472]
[2021/12/03 10:55:09.637 +08:00] [INFO] [peer.rs:862] ["deleting applied snap file"] [snap_file=97957472_6_331] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:55:39.703 +08:00] [INFO] [apply.rs:1393] ["execute admin command"] [command="cmd_type: BatchSplit splits { requests { split_key: 748000000000000FFF645F698000000000FF00000C0419A8DAB0FF3000000001376236FF3431306264FF2D65FF6663302D3464FF37FF372D386264622DFFFF6363326439353435FFFF37313634000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0132353033666365FF36FF2D623932332DFF3432FF61622D3965FF32612DFF31643339FF39666165FF373132FF6400000000FB0380FF00000042FE4F4900FE new_region_id: 97957583 new_peer_ids: 97957584 new_peer_ids: 97957585 new_peer_ids: 97957586 } right_derive: true }"] [index=3791] [term=6] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:55:39.703 +08:00] [INFO] [apply.rs:2228] ["split region"] [keys="key 748000000000000FFF645F698000000000FF00000C0419A8DAB0FF3000000001376236FF3431306264FF2D65FF6663302D3464FF37FF372D386264622DFFFF6363326439353435FFFF37313634000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0132353033666365FF36FF2D623932332DFF3432FF61622D3965FF32612DFF31643339FF39666165FF373132FF6400000000FB0380FF00000042FE4F4900FE"] [region="id: 97957472 start_key: 748000000000000FFF645F698000000000FF00000C0419A8D482FF1F00000001396665FF6636363563FF2D37FF6634662D3464FF62FF302D623838322DFFFF3762666166373031FFFF35613865000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0164323832333733FF31FF2D346333372DFF3435FF62372D3935FF63612DFF35616135FF32633335FF363337FF6500000000FB0380FF00000036E44E0000FE end_key: 748000000000000FFF645F698000000000FF00000C0419A8EC81FFD500000001386236FF6536343332FF2D63FF3630332D3432FF35FF382D396332312DFFFF3033396562333233FFFF61613536000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0139646262313739FF38FF2D313839392DFF3435FF39382D3866FF32652DFF31373637FF32653937FF383831FF6500000000FB0380FF0000003D6CAF5C00FE region_epoch { conf_ver: 1667 version: 9985 } peers { id: 97957473 store_id: 80511562 } peers { id: 97957474 store_id: 80512891 } peers { id: 97957475 store_id: 97823516 }"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:57:30.733 +08:00] [INFO] [snap.rs:247] ["saving snapshot file"] [file=/tidb2/tidb/tikv-20162/tidb-data/snap/rev_97957472_6_6338_(default|lock|write).sst] [snap_key=97957472_6_6338]
[2021/12/03 10:57:31.519 +08:00] [INFO] [raft_log.rs:614] ["log [committed=5622, persisted=5622, applied=5622, unstable.offset=5623, unstable.entries.len()=0] starts to restore snapshot [index: 6338, term: 6]"] [snapshot_term=6] [snapshot_index=6338] [log="committed=5622, persisted=5622, applied=5622, unstable.offset=5623, unstable.entries.len()=0"] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:57:31.519 +08:00] [INFO] [raft.rs:2609] ["switched to configuration"] [config="Configuration { voters: Configuration { incoming: Configuration { voters: {97957473, 97957474, 97957475} }, outgoing: Configuration { voters: {} } }, learners: {97957681}, learners_next: {}, auto_leave: false }"] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:57:31.519 +08:00] [INFO] [raft.rs:2593] ["restored snapshot"] [snapshot_term=6] [snapshot_index=6338] [last_term=6] [last_index=6338] [commit=6338] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:57:31.519 +08:00] [INFO] [raft.rs:2474] ["[commit: 6338, term: 6] restored snapshot [index: 6338, term: 6]"] [snapshot_term=6] [snapshot_index=6338] [commit=6338] [term=6] [raft_id=97957474] [region_id=97957472]
[2021/12/03 10:57:31.519 +08:00] [INFO] [peer_storage.rs:1244] ["begin to apply snapshot"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:57:31.519 +08:00] [INFO] [peer_storage.rs:1638] ["finish clear peer meta"] [takes=305.723µs] [raft_key=1] [apply_key=1] [meta_key=1] [region_id=97957472]
[2021/12/03 10:57:31.519 +08:00] [INFO] [peer_storage.rs:1287] ["apply snapshot with state ok"] [state="applied_index: 6338 commit_index: 5622 commit_term: 6 truncated_state { index: 6338 term: 6 }"] [region="id: 97957472 start_key: 748000000000000FFF645F698000000000FF00000C0419A8DAB0FF3000000001376236FF3431306264FF2D65FF6663302D3464FF37FF372D386264622DFFFF6363326439353435FFFF37313634000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0132353033666365FF36FF2D623932332DFF3432FF61622D3965FF32612DFF31643339FF39666165FF373132FF6400000000FB0380FF00000042FE4F4900FE end_key: 748000000000000FFF645F698000000000FF00000C0419A8EC81FFD500000001386236FF6536343332FF2D63FF3630332D3432FF35FF382D396332312DFFFF3033396562333233FFFF61613536000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0139646262313739FF38FF2D313839392DFF3435FF39382D3866FF32652DFF31373637FF32653937FF383831FF6500000000FB0380FF0000003D6CAF5C00FE region_epoch { conf_ver: 1668 version: 9986 } peers { id: 97957473 store_id: 80511562 } peers { id: 97957474 store_id: 80512891 } peers { id: 97957475 store_id: 97823516 } peers { id: 97957681 store_id: 80512890 role: Learner }"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:57:33.703 +08:00] [INFO] [region.rs:328] ["begin apply snap data"] [region_id=97957472]
[2021/12/03 10:57:33.703 +08:00] [INFO] [peer.rs:3171] ["snapshot is applied"] [region="id: 97957472 start_key: 748000000000000FFF645F698000000000FF00000C0419A8DAB0FF3000000001376236FF3431306264FF2D65FF6663302D3464FF37FF372D386264622DFFFF6363326439353435FFFF37313634000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0132353033666365FF36FF2D623932332DFF3432FF61622D3965FF32612DFF31643339FF39666165FF373132FF6400000000FB0380FF00000042FE4F4900FE end_key: 748000000000000FFF645F698000000000FF00000C0419A8EC81FFD500000001386236FF6536343332FF2D63FF3630332D3432FF35FF382D396332312DFFFF3033396562333233FFFF61613536000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0139646262313739FF38FF2D313839392DFF3435FF39382D3866FF32652DFF31373637FF32653937FF383831FF6500000000FB0380FF0000003D6CAF5C00FE region_epoch { conf_ver: 1668 version: 9986 } peers { id: 97957473 store_id: 80511562 } peers { id: 97957474 store_id: 80512891 } peers { id: 97957475 store_id: 97823516 } peers { id: 97957681 store_id: 80512890 role: Learner }"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:57:33.703 +08:00] [INFO] [apply.rs:3182] ["re-register to apply delegates"] [term=6] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:57:33.703 +08:00] [INFO] [peer.rs:3222] ["region changed after applying snapshot"] [region="id: 97957472 start_key: 748000000000000FFF645F698000000000FF00000C0419A8DAB0FF3000000001376236FF3431306264FF2D65FF6663302D3464FF37FF372D386264622DFFFF6363326439353435FFFF37313634000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0132353033666365FF36FF2D623932332DFF3432FF61622D3965FF32612DFF31643339FF39666165FF373132FF6400000000FB0380FF00000042FE4F4900FE end_key: 748000000000000FFF645F698000000000FF00000C0419A8EC81FFD500000001386236FF6536343332FF2D63FF3630332D3432FF35FF382D396332312DFFFF3033396562333233FFFF61613536000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0139646262313739FF38FF2D313839392DFF3435FF39382D3866FF32652DFF31373637FF32653937FF383831FF6500000000FB0380FF0000003D6CAF5C00FE region_epoch { conf_ver: 1668 version: 9986 } peers { id: 97957473 store_id: 80511562 } peers { id: 97957474 store_id: 80512891 } peers { id: 97957475 store_id: 97823516 } peers { id: 97957681 store_id: 80512890 role: Learner }"] [prev_region="id: 97957472 start_key: 748000000000000FFF645F698000000000FF00000C0419A8DAB0FF3000000001376236FF3431306264FF2D65FF6663302D3464FF37FF372D386264622DFFFF6363326439353435FFFF37313634000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0132353033666365FF36FF2D623932332DFF3432FF61622D3965FF32612DFF31643339FF39666165FF373132FF6400000000FB0380FF00000042FE4F4900FE end_key: 748000000000000FFF645F698000000000FF00000C0419A8EC81FFD500000001386236FF6536343332FF2D63FF3630332D3432FF35FF382D396332312DFFFF3033396562333233FFFF61613536000000FF00FB013538666434FF643438FF2D636265FF352D3433FF30632DFF386363612DFF3862FF356233393436FF39FF62643500000000FBFF0139646262313739FF38FF2D313839392DFF3435FF39382D3866FF32652DFF31373637FF32653937FF383831FF6500000000FB0380FF0000003D6CAF5C00FE region_epoch { conf_ver: 1667 version: 9986 } peers { id: 97957473 store_id: 80511562 } peers { id: 97957474 store_id: 80512891 } peers { id: 97957475 store_id: 97823516 }"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:57:33.703 +08:00] [INFO] [region_info_accessor.rs:238] ["trying to create region but it already exists, try to update it"] [region_id=97957472]
[2021/12/03 10:57:33.703 +08:00] [INFO] [peer.rs:1801] ["receives gc message, trying to remove"] [to_peer="id: 97957474 store_id: 80512891"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:57:33.703 +08:00] [INFO] [peer.rs:1801] ["receives gc message, trying to remove"] [to_peer="id: 97957474 store_id: 80512891"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:57:33.762 +08:00] [INFO] [peer.rs:784] ["stale peer is applying snapshot, will destroy next time"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:57:33.762 +08:00] [INFO] [peer.rs:784] ["stale peer is applying snapshot, will destroy next time"] [peer_id=97957474] [region_id=97957472]
[2021/12/03 10:57:37.142 +08:00] [WARN] [region.rs:428] ["applying snapshot is aborted"] [region_id=97957472]
[2021/12/03 10:58:14.973 +08:00] [INFO] [peer.rs:862] ["deleting applied snap file"] [snap_file=97957472_6_6338] [peer_id=97957474] [region_id=97957472]
[2021/12/03 11:09:40.014 +08:00] [WARN] [peer.rs:4067] ["leader missing longer than abnormal_leader_missing_duration"] [expect=10m] [peer_id=97957474] [region_id=97957472]
[2021/12/03 11:09:48.476 +08:00] [FATAL] [lib.rs:465] ["unexpected error getting unapplied entries [5623, 6339): Store(Compacted), raft_id: 97957474, region_id: 97957472"] [backtrace="stack backtrace:\n   6: raft::raft::Raft<T>::hup::{{closure}}\n             at /rust/git/checkouts/raft-rs-42b8049ef2e3af07/e6d28ef/src/raft.rs:1498\n   7: core::result::Result<T,E>::unwrap_or_else\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/core/src/result.rs:1065\n      raft::raft::Raft<T>::hup\n             at /rust/git/checkouts/raft-rs-42b8049ef2e3af07/e6d28ef/src/raft.rs:1494\n   8: raft::raft::Raft<T>::step\n             at /rust/git/checkouts/raft-rs-42b8049ef2e3af07/e6d28ef/src/raft.rs:1421\n   9: raft::raft::Raft<T>::tick_election\n             at /rust/git/checkouts/raft-rs-42b8049ef2e3af07/e6d28ef/src/raft.rs:1049\n      raft::raft::Raft<T>::tick\n             at /rust/git/checkouts/raft-rs-42b8049ef2e3af07/e6d28ef/src/raft.rs:1031\n      raft::raw_node::RawNode<T>::tick\n             at /rust/git/checkouts/raft-rs-42b8049ef2e3af07/e6d28ef/src/raw_node.rs:343\n  10: raftstore::store::fsm::peer::PeerFsmDelegate<EK,ER,T>::on_raft_base_tick\n             at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/raftstore/src/store/fsm/peer.rs:1232\n      raftstore::store::fsm::peer::PeerFsmDelegate<EK,ER,T>::on_tick\n             at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/raftstore/src/store/fsm/peer.rs:775\n      raftstore::store::fsm::peer::PeerFsmDelegate<EK,ER,T>::handle_msgs\n             at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/raftstore/src/store/fsm/peer.rs:618\n  11: <raftstore::store::fsm::store::RaftPoller<EK,ER,T> as batch_system::batch::PollHandler<raftstore::store::fsm::peer::PeerFsm<EK,ER>,raftstore::store::fsm::store::StoreFsm<EK>>>::handle_normal\n             at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/raftstore/src/store/fsm/store.rs:929\n  12: batch_system::batch::Poller<N,C,Handler>::poll\n             at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/batch-system/src/batch.rs:414\n  13: batch_system::batch::BatchSystem<N,C>::start_poller::{{closure}}\n             at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/batch-system/src/batch.rs:541\n      std::sys_common::backtrace::__rust_begin_short_backtrace\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/sys_common/backtrace.rs:125\n  14: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/thread/mod.rs:476\n      <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panic.rs:347\n      std::panicking::try::do_call\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panicking.rs:401\n      std::panicking::try\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panicking.rs:365\n      std::panic::catch_unwind\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panic.rs:434\n      std::thread::Builder::spawn_unchecked::{{closure}}\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/thread/mod.rs:475\n      core::ops::function::FnOnce::call_once{{vtable.shim}}\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/core/src/ops/function.rs:227\n  15: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/alloc/src/boxed.rs:1572\n      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n             at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/alloc/src/boxed.rs:1572\n      std::sys::unix::thread::Thread::new::thread_start\n             at library/std/src/sys/unix/thread.rs:91\n  16: start_thread\n  17: __clone\n"] [location=/rust/git/checkouts/raft-rs-42b8049ef2e3af07/e6d28ef/src/raft.rs:1498] [thread_name=raftstore-80512891-1]

TiKV's version is 5.2.2.
If the snapshot is aborted, the peer can tick because is_applying_snapshot returns false here.

fn on_raft_base_tick(&mut self) {
if self.fsm.peer.pending_remove {
self.fsm.peer.mut_store().flush_cache_metrics();
return;
}
// When having pending snapshot, if election timeout is met, it can't pass
// the pending conf change check because first index has been updated to
// a value that is larger than last index.
if self.fsm.peer.is_applying_snapshot() || self.fsm.peer.has_pending_snapshot() {
// need to check if snapshot is applied.
self.fsm.has_ready = true;
self.fsm.missing_ticks = 0;
self.register_raft_base_tick();
return;
}

After the election timeout, hup will be called and panic will happen because the log between applied_index and last_index does not exist.

In release-4.0, since the advance_apply is called after the snapshot is just persisted then the applied_index is updated, this panic can not happen. Actually, it’s just good luck.

In 5.3 and master, the is_applying_snapshot is replaced with is_handling_snapshot. The is_handling_snapshot returns true even when the snapshot is aborted. Therefore, panic won't happen either.

In most cases, the peer will be destroyed soon if the snapshot is aborted. However, when the network is partitioned for a while duration that time, the peer may stay exist for a long time. This is because the peer needs an extra message to trigger the destroy process.

We can fix this bug by moving the position of advance_apply like release-4.0 but the root cause is not completely resolved. There may be some other bugs hidden under the iceberg.

We should totally fix this bug(maybe bugs) by destroying peer after snapshot is aborted.

@gengliqi
Copy link
Member Author

gengliqi commented Dec 9, 2021

/cc @BusyJay @NingLin-P

@gengliqi gengliqi added affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. type/bug The issue is confirmed as a bug. labels Dec 9, 2021
@zhangjinpeng87
Copy link
Member

any update?

@gengliqi
Copy link
Member Author

I will fix it in a simple way first.

tonyxuqqi added a commit to tonyxuqqi/tikv that referenced this issue Jan 5, 2022
tonyxuqqi added a commit to tonyxuqqi/tikv that referenced this issue Jan 5, 2022
Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>
@tonyxuqqi
Copy link
Contributor

/assign tonyxuqqi

tonyxuqqi added a commit to tonyxuqqi/tikv that referenced this issue Jan 6, 2022
Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>
tonyxuqqi added a commit to tonyxuqqi/tikv that referenced this issue Jan 7, 2022
Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>
tonyxuqqi added a commit to tonyxuqqi/tikv that referenced this issue Jan 7, 2022
Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>
ti-chi-bot added a commit that referenced this issue Apr 13, 2022
fix #11618, close #11618

Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
@gengliqi
Copy link
Member Author

close due to #11796

ti-chi-bot pushed a commit that referenced this issue Dec 9, 2022
…12358)

close #11618, fix #11618, ref #11796

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: tonyxuqqi <tonyxuqi@outlook.com>
Co-authored-by: Liqi Geng <gengliqiii@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

6 participants