Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release test: schrodinger/bank true failed with error region range overlapped #3435

Closed
solotzg opened this issue Nov 15, 2021 · 2 comments · Fixed by #4136
Closed

release test: schrodinger/bank true failed with error region range overlapped #3435

solotzg opened this issue Nov 15, 2021 · 2 comments · Fixed by #4136

Comments

@solotzg
Copy link
Contributor

solotzg commented Nov 15, 2021

https://ci.pingcap.net/blue/organizations/jenkins/tiflash_schrodinger_test/detail/tiflash_schrodinger_test/1582/pipeline/62/

tikv.log

[2021/11/13 16:19:20.076 +00:00] [INFO] [pd.rs:814] ["try to batch split region"] [task=batch_split] [region="id: 148908 start_key: 7480000000000000FF3B5F728000000001FF692EE50000000000FA end_key: 7480000000000000FF3B5F728000000001FFAADCA60000000000FA region_epoch { conf_ver: 6287 version: 728 } peers { id: 148909 store_id: 5 } peers { id: 148910 store_id: 62 role: Learner } peers { id: 149085 store_id: 
65 role: Learner }"] [new_region_ids="[new_region_id: 149086 new_peer_ids: 149087 new_peer_ids: 149088 new_peer_ids: 149089, new_region_id: 149090 new_peer_ids: 149091 new_peer_ids: 149092 new_peer_ids: 
149093, new_region_id: 149094 new_peer_ids: 149095 new_peer_ids: 149096 new_peer_ids: 149097]"] [region_id=148908]
[2021/11/13 16:19:20.077 +00:00] [INFO] [apply.rs:1382] ["execute admin command"] [command="cmd_type: BatchSplit splits { requests { split_key: 7480000000000000FF3B5F728000000001FF799A550000000000FA new_region_id: 149086 new_peer_ids: 149087 new_peer_ids: 149088 new_peer_ids: 149089 } requests { split_key: 7480000000000000FF3B5F728000000001FF8A05C80000000000FA new_region_id: 149090 new_peer_ids: 149091 
new_peer_ids: 149092 new_peer_ids: 149093 } requests { split_key: 7480000000000000FF3B5F728000000001FF9A712D0000000000FA new_region_id: 149094 new_peer_ids: 149095 new_peer_ids: 149096 new_peer_ids: 149097 } right_derive: true }"] [index=14] [term=6] [peer_id=148909] [region_id=148908]
[2021/11/13 16:19:20.077 +00:00] [INFO] [apply.rs:2223] ["split region"] [keys="3 keys range from 7480000000000000FF3B5F728000000001FF799A550000000000FA to 7480000000000000FF3B5F728000000001FF9A712D0000000000FA"] [region="id: 148908 start_key: 7480000000000000FF3B5F728000000001FF692EE50000000000FA end_key: 7480000000000000FF3B5F728000000001FFAADCA60000000000FA region_epoch { conf_ver: 6287 version: 728 
} peers { id: 148909 store_id: 5 } peers { id: 148910 store_id: 62 role: Learner } peers { id: 149085 store_id: 65 role: Learner }"] [peer_id=148909] [region_id=148908]
[2021/11/13 16:19:20.077 +00:00] [INFO] [pd.rs:1249] ["try to change peer"] [kind=LeaveJoint] [changes="[]"] [region_id=148816]
[2021/11/13 16:19:20.077 +00:00] [INFO] [peer.rs:3690] ["propose conf change peer"] [kind=LeaveJoint] [changes="[]"] [peer_id=149072] [region_id=148816]
[2021/11/13 16:19:20.078 +00:00] [INFO] [endpoint.rs:158] ["region met split/merge command, stop tracking since key range changed, wait for re-register"] [req_type=BatchSplit]
[2021/11/13 16:19:20.078 +00:00] [INFO] [peer.rs:2627] ["notify pd with split"] [split_count=4] [peer_id=148909] [region_id=148908]
[2021/11/13 16:19:20.078 +00:00] [INFO] [peer.rs:2692] ["insert new region"] [region="id: 149086 start_key: 7480000000000000FF3B5F728000000001FF692EE50000000000FA end_key: 7480000000000000FF3B5F728000000001FF799A550000000000FA region_epoch { conf_ver: 6287 version: 731 } peers { id: 149087 store_id: 5 } peers { id: 149088 store_id: 62 role: Learner } peers { id: 149089 store_id: 65 role: Learner }"] [region_id=149086]
[2021/11/13 16:19:20.079 +00:00] [INFO] [peer.rs:208] ["create peer"] [peer_id=149087] [region_id=149086]
[2021/11/13 16:19:20.079 +00:00] [INFO] [raft.rs:2609] ["switched to configuration"] [config="Configuration { voters: Configuration { incoming: Configuration { voters: {149087} }, outgoing: Configuration { voters: {} } }, learners: {149088, 149089}, learners_next: {}, auto_leave: false }"] [raft_id=149087] [region_id=149086]
[2021/11/13 16:19:20.079 +00:00] [INFO] [raft.rs:1092] ["became follower at term 5"] [term=5] [raft_id=149087] [region_id=149086]
[2021/11/13 16:19:20.079 +00:00] [INFO] [raft.rs:384] [newRaft] [peers="Configuration { incoming: Configuration { voters: {149087} }, outgoing: Configuration { voters: {} } }"] ["last term"=5] ["last index"=5] [applied=5] [commit=5] [term=5] [raft_id=149087] [region_id=149086]
[2021/11/13 16:19:20.079 +00:00] [INFO] [raw_node.rs:315] ["RawNode created with id 149087."] [id=149087] [raft_id=149087] [region_id=149086]
[2021/11/13 16:19:20.079 +00:00] [INFO] [raft.rs:1517] ["starting a new election"] [term=5] [raft_id=149087] [region_id=149086]
[2021/11/13 16:19:20.079 +00:00] [INFO] [raft.rs:1142] ["became pre-candidate at term 5"] [term=5] [raft_id=149087] [region_id=149086]
[2021/11/13 16:19:20.079 +00:00] [INFO] [raft.rs:1116] ["became candidate at term 6"] [term=6] [raft_id=149087] [region_id=149086]
[2021/11/13 16:19:20.079 +00:00] [INFO] [raft.rs:1200] ["became leader at term 6"] [term=6] [raft_id=149087] [region_id=149086]

proxy.log

[2021/11/13 16:19:22.087 +00:00] [INFO] [peer.rs:248] ["replicate peer"] [peer_id=149089] [region_id=149086]
[2021/11/13 16:19:22.087 +00:00] [INFO] [raft.rs:2605] ["switched to configuration"] [config="Configuration { voters: Configuration { incoming: Configuration { voters: {} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }"] [raft_id=149089] [region_id=149086]
[2021/11/13 16:19:22.087 +00:00] [INFO] [raft.rs:1088] ["became follower at term 0"] [term=0] [raft_id=149089] [region_id=149086]
[2021/11/13 16:19:22.087 +00:00] [INFO] [raft.rs:389] [newRaft] [peers="Configuration { incoming: Configuration { voters: {} }, outgoing: Configuration { voters: {} } }"] ["last term"=0] ["last index"=0] [applied=0] [commit=0] [term=0] [raft_id=149089] [region_id=149086]
[2021/11/13 16:19:22.087 +00:00] [INFO] [raw_node.rs:315] ["RawNode created with id 149089."] [id=149089] [raft_id=149089] [region_id=149086]
[2021/11/13 16:19:22.087 +00:00] [INFO] [raft.rs:1332] ["received a message with higher term from 149087"] ["msg type"=MsgHeartbeat] [message_term=6] [term=0] [from=149087] [raft_id=149089] [region_id=149086]
[2021/11/13 16:19:22.087 +00:00] [INFO] [raft.rs:1088] ["became follower at term 6"] [term=6] [raft_id=149089] [region_id=149086]
[2021/11/13 16:19:22.095 +00:00] [INFO] [peer.rs:1662] ["receives gc message, trying to remove"] [to_peer="id: 149089 store_id: 65 role: Learner"] [peer_id=149089] [region_id=149086]
[2021/11/13 16:19:22.095 +00:00] [INFO] [peer.rs:1949] ["starts destroy"] [merged_by_target=false] [peer_id=149089] [region_id=149086]
[2021/11/13 16:19:22.095 +00:00] [INFO] [peer.rs:808] ["begin to destroy"] [peer_id=149089] [region_id=149086]
[2021/11/13 16:19:22.095 +00:00] [INFO] [peer_storage.rs:1668] ["finish clear peer meta"] [takes=55.348µs] [raft_key=1] [apply_key=1] [meta_key=1] [region_id=149086]
[2021/11/13 16:19:22.095 +00:00] [INFO] [peer.rs:848] ["peer destroy itself"] [takes=463.175µs] [peer_id=149089] [region_id=149086]
[2021/11/13 16:19:22.095 +00:00] [INFO] [router.rs:300] ["[region 149086] shutdown mailbox"]
[2021/11/13 16:20:35.556 +00:00] [INFO] [apply.rs:1419] ["execute admin command"] [command="cmd_type: BatchSplit splits { requests { split_key: 7480000000000000FF3B5F728000000001FF799A550000000000FA new_region_id: 149086 new_peer_ids: 149087 new_peer_ids: 149088 new_peer_ids: 149089 } requests { split_key: 7480000000000000FF3B5F728000000001FF8A05C80000000000FA new_region_id: 149090 new_peer_ids: 149091 new_peer_ids: 149092 new_peer_ids: 149093 } requests { split_key: 7480000000000000FF3B5F728000000001FF9A712D0000000000FA new_region_id: 149094 new_peer_ids: 149095 new_peer_ids: 149096 new_peer_ids: 149097 } right_derive: true }"] [index=14] [term=6] [peer_id=149085] [region_id=148908]
[2021/11/13 16:20:35.556 +00:00] [INFO] [apply.rs:2341] ["split region"] [keys="3 keys range from 7480000000000000FF3B5F728000000001FF799A550000000000FA to 7480000000000000FF3B5F728000000001FF9A712D0000000000FA"] [region="id: 148908 start_key: 7480000000000000FF3B5F728000000001FF692EE50000000000FA end_key: 7480000000000000FF3B5F728000000001FFAADCA60000000000FA region_epoch { conf_ver: 6287 version: 728 } peers { id: 148909 store_id: 5 } peers { id: 148910 store_id: 62 role: Learner } peers { id: 149085 store_id: 65 role: Learner }"] [peer_id=149085] [region_id=148908]
[2021/11/13 16:20:35.556 +00:00] [WARN] [apply.rs:2458] ["new region from splitting already exists"] [peer_id=149085] [region_id=148908] [reason="state state: Tombstone region { id: 149086 } exist in kv engine"] [new_peer_id=149089] [new_region_id=149086]

tiflash.log

[2021/11/13 16:21:24.527 +00:00] [INFO] [<unknown>] ["KVStore: Try to apply snapshot: [region 149282, applied: term 6 index 8]"] [thread_id=12]
[2021/11/13 16:21:24.565 +00:00] [ERROR] [<unknown>] ["void DB::ApplyPreHandledSnapshot(DB::EngineStoreServerWrap*, PreHandledSnapshot*) [with PreHandledSnapshot = DB::PreHandledSnapshotWithFiles]: Code: 49, e.displayText() = DB::Exception: DB::KVStore::checkAndApplySnapshot(const RegionPtrWrap&, DB::TMTContext&)::<lambda(DB::RegionMap, const DB::KVStoreTaskLock&)> [with RegionPtrWrap = DB::RegionPtrWithSnapshotFiles; DB::RegionMap = std::unordered_map<long unsigned int, std::shared_ptr<DB::Region> >]: range of region 149282 is overlapped with region 149086, should not happen, e.what() = DB::Exception, Stack trace:\n\n0. /tmp/ti/schrodinger/bank/tiflash1/tiflash(StackTrace::StackTrace()+0x16) [0x370f6c6]\n1. /tmp/ti/schrodinger/bank/tiflash1/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x26) [0x37034c6]\n2. /tmp/ti/schrodinger/bank/tiflash1/tiflash(void DB::KVStore::checkAndApplySnapshot<DB::RegionPtrWithSnapshotFiles>(DB::RegionPtrWithSnapshotFiles const&, DB::TMTContext&)::{lambda(std::unordered_map<unsigned long, std::shared_ptr<DB::Region>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::shared_ptr<DB::Region> > > >, DB::KVStoreTaskLock const&)#1}::operator()(std::unordered_map<unsigned long, std::shared_ptr<DB::Region>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::shared_ptr<DB::Region> > > >, DB::KVStoreTaskLock const&) const+0x176) [0x7b36cc6]\n3. /tmp/ti/schrodinger/bank/tiflash1/tiflash(std::_Function_handler<void (std::unordered_map<unsigned long, std::shared_ptr<DB::Region>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::shared_ptr<DB::Region> > > >, DB::KVStoreTaskLock const&), void DB::KVStore::checkAndApplySnapshot<DB::RegionPtrWithSnapshotFiles>(DB::RegionPtrWithSnapshotFiles const&, DB::TMTContext&)::{lambda(std::unordered_map<unsigned long, std::shared_ptr<DB::Region>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::shared_ptr<DB::Region> > > >, DB::KVStoreTaskLock const&)#1}>::_M_invoke(std::_Any_data const&, std::unordered_map<unsigned long, std::shared_ptr<DB::Region>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::shared_ptr<DB::Region> > > >&&, DB::KVStoreTaskLock const&)+0x9f) [0x7b3715f]\n4. /tmp/ti/schrodinger/bank/tiflash1/tiflash(DB::KVStore::handleRegionsByRangeOverlap(std::pair<DB::TiKVRangeKey, DB::TiKVRangeKey> const&, std::function<void (std::unordered_map<unsigned long, std::shared_ptr<DB::Region>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::shared_ptr<DB::Region> > > >, DB::KVStoreTaskLock const&)>&&) const+0x53) [0x7759d13]\n5. /tmp/ti/schrodinger/bank/tiflash1/tiflash(void DB::KVStore::checkAndApplySnapshot<DB::RegionPtrWithSnapshotFiles>(DB::RegionPtrWithSnapshotFiles const&, DB::TMTContext&)+0x283) [0x7b39b43]\n6. /tmp/ti/schrodinger/bank/tiflash1/tiflash(void DB::KVStore::handlePreApplySnapshot<DB::RegionPtrWithSnapshotFiles>(DB::RegionPtrWithSnapshotFiles const&, DB::TMTContext&)+0x15c) [0x7b3a72c]\n7. /tmp/ti/schrodinger/bank/tiflash1/tiflash(ApplyPreHandledSnapshot+0x8d) [0x776c5fd]\n8. /tmp/ti/schrodinger/bank/tiflash1/libtiflash_proxy.so(+0x1432a60) [0x7f93674caa60]\n9. /tmp/ti/schrodinger/bank/tiflash1/libtiflash_proxy.so(+0x142e048) [0x7f93674c6048]\n10. /tmp/ti/schrodinger/bank/tiflash1/libtiflash_proxy.so(+0x1247021) [0x7f93672df021]\n11. /tmp/ti/schrodinger/bank/tiflash1/libtiflash_proxy.so(+0x1d384e6) [0x7f9367dd04e6]\n12. /tmp/ti/schrodinger/bank/tiflash1/libtiflash_proxy.so(+0xd7d42f) [0x7f9366e1542f]\n13. /tmp/ti/schrodinger/bank/tiflash1/libtiflash_proxy.so(+0xdaf5f7) [0x7f9366e475f7]\n14. /tmp/ti/schrodinger/bank/tiflash1/libtiflash_proxy.so(+0x14f757b) [0x7f936758f57b]\n15. /usr/lib64/libpthread.so.0(+0x82de) [0x7f93658592de]\n16. /usr/lib64/libc.so.6(clone+0x43) [0x7f9365207a63]\n"] [thread_id=12]

Description

  • region 149086 is created by BatchSplit cmd in tikv at 2021/11/13 16:19:20 but such cmd did not be executed in proxy until 2021/11/13 16:20:35.
  • In proxy, region 149086 connected with tikv leader at 2021/11/13 16:19:22 and received gc message. Its state was set Tombstone.
  • proxy executed BatchSplit cmd at 2021/11/13 16:20:35 and got warning new region from splitting already exists state state: Tombstone region { id: 149086 } exist in kv engine. TiFlash executed BatchSplit cmd as normal, then region 149086 was created.
  • TiFlash tried to apply snapshot of region 149282 and meet exception about range of region 149282 is overlapped with region 149086.
@solotzg solotzg added the type/bug The issue is confirmed as a bug. label Nov 15, 2021
@solotzg
Copy link
Contributor Author

solotzg commented Nov 15, 2021

TODO:

@ilovesoup
Copy link
Contributor

How often this happens and is this introduced in new PRs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants