*: Check mutations for single-row changes #27920

ekexium · 2021-09-09T04:21:33Z

What problem does this PR solve?

Issue Number: part of #26833

Problem Summary:

Reduce data-index inconsistency issues by checking whether single-row changes generate corrupted mutations.

What is changed and how it works?

What's Changed:

RemoveRecord creates a mem buffer stage, which is used to collect mutations generated by the operation
For AddRecord, UpdateRecord and RemoveRecord, check the consistency of its mutations before releasing its mem buffer stage

How it Works:

Record mutations must be consistent with the input
Values of index mutations must be consistent with the input

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please add a release note, or a 'None' if it is not needed.

ti-chi-bot · 2021-09-09T04:21:34Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

MyonKeminta
cfzjywxk

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

table/tables/mutation_checker.go

cfzjywxk · 2021-09-09T12:33:13Z

table/tables/mutation_checker.go

+func CheckIndexConsistency(sessVars *variable.SessionVars, t *TableCommon,
+	dataAdded, dataRemoved []types.Datum, memBuffer kv.MemBuffer, sh kv.StagingHandle) error {
+	sc := sessVars.StmtCtx
+	if sh == 0 {


When will this happend？If it's unexpected should we print an error log here?

I don't quite remember when it's needed. I will let it return an error and see if any test fails.

Some implementations of MemBuffer doesn't support staging. For example, the one in lightning:

tidb/br/pkg/lightning/backend/kv/session.go

Lines 138 to 140 in b25a392

func (mb *kvMemBuf) Staging() kv.StagingHandle {

return 0

}

I think we can just ignore them.

table/tables/mutation_checker.go

cfzjywxk · 2021-09-09T12:45:21Z

table/tables/mutation_checker.go

+			return errors.Trace(err)
+		}
+		if cmp != 0 {
+			logutil.BgLogger().Error("inconsistent row mutation", zap.String("decoded datum", decodedDatum.String()),


For the error report we could use reporter from #27388, and inconsistency events could traced by the transaction event.
/cc @MyonKeminta @longfangsong

cfzjywxk · 2021-09-09T12:48:42Z

table/tables/mutation_checker.go

+func checkIndexKeys(sc *stmtctx.StatementContext, sessVars *variable.SessionVars, t *TableCommon,
+	dataAdded []types.Datum, dataRemoved []types.Datum, mutations []mutation) error {
+	indexIDMap := make(map[int64]indexHelperInfo)
+	for _, index := range t.indices {


Should the clustered primary index be skipped here? If so we could rename it checkSecondaryIndexKeys

It can be skipped.
I think there are still cases where the (non-clustered) primary indices need to be checked?

cfzjywxk · 2021-09-09T12:51:06Z

table/tables/mutation_checker.go

+		if len(m.value) == 0 && NeedRestoredData(indexHelperInfo.indexInfo.Columns, t.Meta().Columns) {
+			continue
+		}
+
+		decodedIndexValues, err := tablecodec.DecodeIndexKV(m.key, m.value, len(indexHelperInfo.indexInfo.Columns),
+			tablecodec.HandleNotNeeded, indexHelperInfo.rowColInfos)


Will index key utilities like expression index, collations or row formats introduce corner cases here?
Need help /cc @lysu @wjhuang2016

No, I think it's fine here.

wjhuang2016 · 2021-09-13T09:08:39Z

table/tables/mutation_checker.go

+		if len(m.value) == 0 && NeedRestoredData(indexHelperInfo.indexInfo.Columns, t.Meta().Columns) {
+			continue
+		}
+
+		decodedIndexValues, err := tablecodec.DecodeIndexKV(m.key, m.value, len(indexHelperInfo.indexInfo.Columns),
+			tablecodec.HandleNotNeeded, indexHelperInfo.rowColInfos)


No, I think it's fine here.

wjhuang2016 · 2021-09-13T09:11:03Z

table/tables/mutation_checker.go

+	}
+	mutations := collectTableMutationsFromBufferStage(t, memBuffer, sh)
+	if err := checkRowAdditionConsistency(sessVars, t.Meta().Columns, dataAdded, mutations); err != nil {
+		return errors.Trace(err)


No need to use Trace if the err is generated by errors.New

The errors are temporary. They might change after #27388 is merged

ekexium · 2021-09-14T06:13:39Z

/run-all-tests

MyonKeminta

Some of the comments seems outdated... but please still take a look

table/tables/mutation_checker.go

MyonKeminta · 2021-09-14T07:24:50Z

table/tables/mutation_checker.go

+}
+
+func collectTableMutationsFromBufferStage(t *TableCommon, memBuffer kv.MemBuffer, sh kv.StagingHandle) []mutation {
+	mutations := make([]mutation, 0)


I think it's possible to make the membuffer support getting the size of the current stage, so that we can reserve enough space and allocate exactly once. But the change might be too much for a single PR.

table/tables/mutation_checker.go

MyonKeminta · 2021-09-14T09:38:08Z

table/tables/mutation_checker.go

+	columnMap := make(map[int64]*model.ColumnInfo)
+	for _, col := range tableColumns {
+		columnMap[col.ID] = col
+	}


This looks too expensive if we do this for each row... If this is really necessary, can we try to store it somewhere like the sessionctx, to make it reusable?

I tried to save it in the stmtctx. PTAL if it's reasonable

Signed-off-by: ekexium <ekexium@gmail.com>

Signed-off-by: ekexium <ekexium@gmail.com> Auto stash before rebase of "ft-data-inconsistency"

Signed-off-by: ekexium <ekexium@gmail.com>

Some implementations of MemBuffer doesn't support staging. We don't care about them for now

ekexium · 2021-09-15T06:13:31Z

/run-all-tests

ekexium · 2021-09-15T07:50:09Z

/run-all-tests

MyonKeminta · 2021-09-15T05:10:20Z

table/tables/mutation_checker.go

+					if rowInsertion.key == nil {
+						rowInsertion = m
+					} else {
+						err = errors.Errorf("multiple row mutations added/mutated, one = %+v, another = %+v", rowInsertion, m)


Be careful that the data may need to be redacted, if you are going to print the error to log

The errors will be replaced with the reporter (in the following PR, I suppose). I think we can handle redactions there.

MyonKeminta · 2021-09-15T05:16:46Z

table/tables/mutation_checker.go

+	columnFieldMap := make(map[int64]*types.FieldType)
+	for id, col := range columnMap {
+		columnFieldMap[id] = &col.FieldType
+	}


Here's another one. How about add a new version of DecodeRowToDatumMap that accepts the ColumnInfo map?

Maybe the small allocations doesn't have much direct affect to the performance, but it increases the GC pressure, so that allocations on such very-frequent paths should be very carefully treated IMO 🤔
And when map or slice is necessary, consider specifying the capacity in the make statement if possible, to reduce the potential reallocation when expanding. For example, here len(columnMap) can be used as the initial capacity of columnFieldmap, if this new map is really needed.

..actually I'm afraid if I'm overdoing... or may we do these kind of optimizations in a new PR...? @cfzjywxk how do you think...

How about add a new version of DecodeRowToDatumMap that accepts the ColumnInfo map?

I think it would be less maintainable.

table/tables/mutation_checker.go

sessionctx/stmtctx/stmtctx.go

ekexium · 2021-09-22T02:46:14Z

/run-check_dev

MyonKeminta

@cfzjywxk PTAL, do you think the unit tests are enough now?

table/tables/mutation_checker_test.go

cfzjywxk · 2021-09-22T06:18:30Z

@cfzjywxk PTAL, do you think the unit tests are enough now?

We may need to design a specific mechanism to cover more combinations of columns and indexes, which seems not easy to be done in the unit-test. The effectiveness and correctness tests need more detailed planning, and they could be considered together.

For example to test the clustered index, we've added something like https://github.com/pingcap/automated-tests/blob/master/ticases/clustered_index/dml/basic_generator.go to generate combinations.

cfzjywxk · 2021-09-22T06:23:25Z

As it's in the development branch, maybe we could merge this first and start to make the detailed test plan ? @MyonKeminta @ekexium What do you think?

MyonKeminta · 2021-09-22T06:28:33Z

🤔 I'm fine with it

ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 9, 2021

ekexium requested review from cfzjywxk, lysu and MyonKeminta September 9, 2021 04:22

cfzjywxk reviewed Sep 9, 2021

View reviewed changes

ti-chi-bot requested a review from longfangsong September 9, 2021 12:51

cfzjywxk requested review from wjhuang2016 and youjiali1995 September 9, 2021 12:54

cfzjywxk added require-LGT3 Indicates that the PR requires three LGTM. sig/transaction SIG:Transaction labels Sep 9, 2021

wjhuang2016 reviewed Sep 13, 2021

View reviewed changes

MyonKeminta reviewed Sep 14, 2021

View reviewed changes

ekexium added 15 commits September 14, 2021 19:32

feat: check index key in AddRecord

aceb3d0

Signed-off-by: ekexium <ekexium@gmail.com>

feat: check for AddRecord/UpdateRecord

d01995b

Signed-off-by: ekexium <ekexium@gmail.com>

ignore deletion of indices if NeedRestoredData

8a75382

Signed-off-by: ekexium <ekexium@gmail.com>

check values of row mutations

11fa293

Signed-off-by: ekexium <ekexium@gmail.com>

style: fix naming

f5160e1

Signed-off-by: ekexium <ekexium@gmail.com>

skip when sh == 0

be15c78

Signed-off-by: ekexium <ekexium@gmail.com>

check in RemoveRecord

dca9d75

Signed-off-by: ekexium <ekexium@gmail.com>

modify license header

ee5a801

Signed-off-by: ekexium <ekexium@gmail.com>

test: add unit tests

ada4dc5

Signed-off-by: ekexium <ekexium@gmail.com>

add lisence header

5c1483a

Signed-off-by: ekexium <ekexium@gmail.com>

also truncate decoded mutation

4e72298

Signed-off-by: ekexium <ekexium@gmail.com>

tidy up

9e01dca

Signed-off-by: ekexium <ekexium@gmail.com>

refactor according to comments

1aab79e

Signed-off-by: ekexium <ekexium@gmail.com> Auto stash before rebase of "ft-data-inconsistency"

skip partitioned table

836f4e3

Signed-off-by: ekexium <ekexium@gmail.com>

save columnMap in stmtctx

5eeb405

ekexium force-pushed the assertion-in-tables branch from 0996a1a to 5eeb405 Compare September 14, 2021 13:00

skip when sh == 0

6431cbe

Some implementations of MemBuffer doesn't support staging. We don't care about them for now

ekexium mentioned this pull request Sep 15, 2021

Defend against data inconsistency and improve its troubleshooting #26833

Open

8 tasks

MyonKeminta reviewed Sep 15, 2021

View reviewed changes

ekexium force-pushed the assertion-in-tables branch from 5f4bbe0 to f868207 Compare September 16, 2021 02:58

cfzjywxk reviewed Sep 16, 2021

View reviewed changes

sessionctx/stmtctx/stmtctx.go Outdated Show resolved Hide resolved

ekexium added 2 commits September 16, 2021 16:09

reuse a slice in a loop

2ee348c

save reusable maps in stmtctx

e448d49

ekexium force-pushed the assertion-in-tables branch from f868207 to e448d49 Compare September 16, 2021 08:15

save reusable maps in txn option

220f0f3

ti-chi-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 21, 2021

add unit test for checkIndexKeys

b90202c

ekexium force-pushed the assertion-in-tables branch from 1045e73 to b90202c Compare September 21, 2021 14:30

MyonKeminta reviewed Sep 22, 2021

View reviewed changes

table/tables/mutation_checker_test.go Outdated Show resolved Hide resolved

table/tables/mutation_checker_test.go Show resolved Hide resolved

address comments in test

841044a

MyonKeminta approved these changes Sep 22, 2021

View reviewed changes

ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 22, 2021

cfzjywxk approved these changes Sep 22, 2021

View reviewed changes

ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 22, 2021

cfzjywxk merged commit 2bece44 into pingcap:ft-data-inconsistency Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: Check mutations for single-row changes #27920

*: Check mutations for single-row changes #27920

ekexium commented Sep 9, 2021

ti-chi-bot commented Sep 9, 2021 •

edited

Loading

cfzjywxk Sep 9, 2021

ekexium Sep 14, 2021

ekexium Sep 15, 2021 •

edited

Loading

cfzjywxk Sep 9, 2021

cfzjywxk Sep 9, 2021

ekexium Sep 13, 2021

cfzjywxk Sep 9, 2021

wjhuang2016 Sep 13, 2021

wjhuang2016 Sep 13, 2021

wjhuang2016 Sep 13, 2021

ekexium Sep 13, 2021

ekexium commented Sep 14, 2021

MyonKeminta left a comment

MyonKeminta Sep 14, 2021

MyonKeminta Sep 14, 2021

ekexium Sep 14, 2021

ekexium commented Sep 15, 2021

ekexium commented Sep 15, 2021

MyonKeminta Sep 15, 2021

ekexium Sep 15, 2021

MyonKeminta Sep 15, 2021

MyonKeminta Sep 15, 2021

MyonKeminta Sep 15, 2021

ekexium Sep 15, 2021

ekexium commented Sep 22, 2021

MyonKeminta left a comment

cfzjywxk commented Sep 22, 2021

cfzjywxk commented Sep 22, 2021

MyonKeminta commented Sep 22, 2021

*: Check mutations for single-row changes #27920

*: Check mutations for single-row changes #27920

Conversation

ekexium commented Sep 9, 2021

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

ti-chi-bot commented Sep 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekexium Sep 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekexium commented Sep 14, 2021

MyonKeminta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekexium commented Sep 15, 2021

ekexium commented Sep 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekexium commented Sep 22, 2021

MyonKeminta left a comment

Choose a reason for hiding this comment

cfzjywxk commented Sep 22, 2021

cfzjywxk commented Sep 22, 2021

MyonKeminta commented Sep 22, 2021

ti-chi-bot commented Sep 9, 2021 •

edited

Loading

ekexium Sep 15, 2021 •

edited

Loading