Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

Default CF lost after BR restore? #565

Closed
YuJuncen opened this issue Oct 23, 2020 · 4 comments
Closed

Default CF lost after BR restore? #565

YuJuncen opened this issue Oct 23, 2020 · 4 comments
Assignees
Labels
difficulty/3-hard Hard issue severity/moderate type/bug Something isn't working

Comments

@YuJuncen
Copy link
Collaborator

YuJuncen commented Oct 23, 2020

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    Run test br_full_ddl.

  2. What did you expect to see?
    The test case success, or failed by network failure like context timeout exceeds.

  3. What did you see instead?
    default not found, see here

  4. What version of BR and TiDB/TiKV/PD are you using?

See the log.

Notes

Unfortunately, all BR logs of this test case are lost. The pod was deleted. There are only two rows of error in TiKV1:

[2020-10-23T00:17:00.659Z] [2020/10/23 08:16:59.632 +08:00] [ERROR] [mod.rs:311] ["default value not found"] [hint=near_load_data_by_write] [key=7480000000000000465F72017573657236323835FF3732363334313135FF3538323132333500FE]
[2020-10-23T00:17:00.659Z] [2020/10/23 08:16:59.632 +08:00] [WARN] [endpoint.rs:596] [error-response] [err="default not found: key:7480000000000000465F72017573657236323835FF3732363334313135FF3538323132333500FE, maybe read truncated/dropped table data?"]

Maybe we have to wait for its next occurrence for more details. Currently, we only know it happens in the checksumming stage.

Some guessing of possible cause:

  • In the restore pipeline, some table not fully restored is sent to checksum.
  • (TBD)

PD: d0430729845d309370d8d4604bda991fc64fc7f8
TiDB: 45b65d16eb3f51f6b9a2a0790b3b743dcf8b154f
TiKV: 417be27592712f3c752ec8e4c1d4520fe50aae5c

Logs: defaultnotfound.zip

@YuJuncen YuJuncen added type/bug Something isn't working difficulty/3-hard Hard issue labels Oct 23, 2020
@3pointer
Copy link
Collaborator

3pointer commented Nov 2, 2020

After reproduce. we found that this issue could happen when backup with cluster_index disabled, and restore with cluster_index enabled.

we catch the backup sst files in test. and restore to new cluster with cluster_index enabled. reproduce the same error.
屏幕快照 2020-11-02 下午8 29 13

with same backup sst files, and just disable the cluster_index in the same cluster.
屏幕快照 2020-11-02 下午8 31 43

it will restore normally.
屏幕快照 2020-11-02 下午8 32 00

@3pointer 3pointer self-assigned this Nov 4, 2020
@overvenus
Copy link
Member

How to fix this issue?

@YuJuncen
Copy link
Collaborator Author

Possible solution: Backing up the tidb_enable_clustered_index and when creating tables, set this variable according to it.

But I fell perplexed that we even didn't touch this variable in the ddl_full test case, why this test failed by default not found...?

@3pointer
Copy link
Collaborator

Possible solution: Backing up the tidb_enable_clustered_index and when creating tables, set this variable according to it.

But I fell perplexed that we even didn't touch this variable in the ddl_full test case, why this test failed by default not found...?

...yes, that's strange. I guess it's a kind of CI problem. Anyway, I copied the backup data from the CI container. and change the cluster_index configuration. it can be restored. So I think we can close it for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
difficulty/3-hard Hard issue severity/moderate type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants