PITR: Run PITR for multiple times could lead to tiflash crash #52628

JaySon-Huang · 2024-04-16T02:48:01Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Backup snapshot and log using PITR
Restore the data within tso1 into a new cluster with tiflash instances by br restore point
Add tiflash replica(s) for the restored table(s) # or if the backup data contains tiflash replica, the tiflash replica will be added after step 2.
Restore the data within tso1...tso2 into the cluster by br restore point

2. What did you expect to see? (Required)

Restore success and all instances run normally

3. What did you see instead (Required)

When running step 4, TiFlash instances crash with backtrace like

[FATAL] [Exception.cpp:106] ["Code: 9008, e.displayText() = DB::Exception: Raw TiDB PK: C80000000CC265A2, Prewrite ts: 449057189415092426 can not found in default cf for key: 7480000000003580FFD75F72C80000000CFFC265A20000000000FAF9C4A0DD8D13FEE1, region_id: 43498, applied_index: 31: (applied_term: 7) ...

4. What is your TiDB version? (Required)

v7.5.1

The text was updated successfully, but these errors were encountered:

JaySon-Huang · 2024-04-16T02:52:25Z

Because PITR will try to restore the logs into the cluster without caring about the order of default cf and write cf (in terms of speeding up the restore). While tiflash rely on when applying a write cf key, its belonging default cf key must exist, otherwise tiflash can not decode the key-value pairs into column data correctly. When tiflash see a write cf without its belonging default cf, tiflash panic.

AkiraXie · 2024-04-16T03:23:36Z

/component br
/severity critical

BornChanger · 2024-04-26T09:34:15Z

It's a compatibility issue and we don't have solution to resolve it but have to document the limitation.

BornChanger · 2024-05-07T09:26:54Z

@JaySon-Huang can TiFlash lift the restriction instead?

JaySon-Huang · 2024-05-07T10:21:49Z

@BornChanger During the step 4 (PITR restore point again), TiFlash cannot tell whether the error is a corrupted RaftLog that was accepted that violated the transaction model or a RaftLog that was recovered by PITR. so TiFlash cannot lift the restriction only for PITR.
Can PITR guarantee that in a single region, all kvs in default_cf will be written before any kvs in the write_cf are restored? I think this can resolve the problem.

YuJuncen · 2024-06-05T06:47:37Z

I guess we need further discussion to decide whether bring this to release branches. For now just fix this in master.

close #52628

seiya-annie · 2024-06-19T09:21:18Z

/found customer

seiya-annie · 2024-06-19T14:44:29Z

/remove-found customer

JaySon-Huang added the type/bug The issue is confirmed as a bug. label Apr 16, 2024

ti-chi-bot bot added component/br This issue is related to BR of TiDB. severity/critical may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 may-affects-8.1 labels Apr 16, 2024

ti-chi-bot added affects-8.1 and removed may-affects-8.1 labels Apr 24, 2024

RidRisR mentioned this issue May 30, 2024

br: fix tiflash confilct #53658

Merged

13 tasks

jebter added the impact/crash crash/fatal label May 31, 2024

YuJuncen removed may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 labels Jun 5, 2024

ti-chi-bot bot closed this as completed in #53658 Jun 5, 2024

ti-chi-bot bot pushed a commit that referenced this issue Jun 5, 2024

br: fix tiflash confilct (#53658)

89adc33

close #52628

ti-chi-bot bot added the report/customer Customers have encountered this bug. label Jun 19, 2024

ti-chi-bot bot removed the report/customer Customers have encountered this bug. label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PITR: Run PITR for multiple times could lead to tiflash crash #52628

PITR: Run PITR for multiple times could lead to tiflash crash #52628

JaySon-Huang commented Apr 16, 2024 •

edited

Loading

JaySon-Huang commented Apr 16, 2024 •

edited

Loading

AkiraXie commented Apr 16, 2024

BornChanger commented Apr 26, 2024

BornChanger commented May 7, 2024

JaySon-Huang commented May 7, 2024

YuJuncen commented Jun 5, 2024

seiya-annie commented Jun 19, 2024

seiya-annie commented Jun 19, 2024

PITR: Run PITR for multiple times could lead to tiflash crash #52628

PITR: Run PITR for multiple times could lead to tiflash crash #52628

Comments

JaySon-Huang commented Apr 16, 2024 • edited Loading

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

JaySon-Huang commented Apr 16, 2024 • edited Loading

AkiraXie commented Apr 16, 2024

BornChanger commented Apr 26, 2024

BornChanger commented May 7, 2024

JaySon-Huang commented May 7, 2024

YuJuncen commented Jun 5, 2024

seiya-annie commented Jun 19, 2024

seiya-annie commented Jun 19, 2024

JaySon-Huang commented Apr 16, 2024 •

edited

Loading

JaySon-Huang commented Apr 16, 2024 •

edited

Loading