Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: Check mutations for single-row changes #27920

Merged
merged 21 commits into from
Sep 22, 2021

Conversation

ekexium
Copy link
Contributor

@ekexium ekexium commented Sep 9, 2021

What problem does this PR solve?

Issue Number: part of #26833

Problem Summary:

Reduce data-index inconsistency issues by checking whether single-row changes generate corrupted mutations.

What is changed and how it works?

What's Changed:

  • RemoveRecord creates a mem buffer stage, which is used to collect mutations generated by the operation
  • For AddRecord, UpdateRecord and RemoveRecord, check the consistency of its mutations before releasing its mem buffer stage

How it Works:

  • Record mutations must be consistent with the input
  • Values of index mutations must be consistent with the input

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please add a release note, or a 'None' if it is not needed.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Sep 9, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • MyonKeminta
  • cfzjywxk

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 9, 2021
table/tables/mutation_checker.go Outdated Show resolved Hide resolved
func CheckIndexConsistency(sessVars *variable.SessionVars, t *TableCommon,
dataAdded, dataRemoved []types.Datum, memBuffer kv.MemBuffer, sh kv.StagingHandle) error {
sc := sessVars.StmtCtx
if sh == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will this happend?If it's unexpected should we print an error log here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite remember when it's needed. I will let it return an error and see if any test fails.

Copy link
Contributor Author

@ekexium ekexium Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some implementations of MemBuffer doesn't support staging. For example, the one in lightning:

func (mb *kvMemBuf) Staging() kv.StagingHandle {
return 0
}

I think we can just ignore them.

table/tables/mutation_checker.go Outdated Show resolved Hide resolved
table/tables/mutation_checker.go Outdated Show resolved Hide resolved
table/tables/mutation_checker.go Outdated Show resolved Hide resolved
return errors.Trace(err)
}
if cmp != 0 {
logutil.BgLogger().Error("inconsistent row mutation", zap.String("decoded datum", decodedDatum.String()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the error report we could use reporter from #27388, and inconsistency events could traced by the transaction event.
/cc @MyonKeminta @longfangsong

func checkIndexKeys(sc *stmtctx.StatementContext, sessVars *variable.SessionVars, t *TableCommon,
dataAdded []types.Datum, dataRemoved []types.Datum, mutations []mutation) error {
indexIDMap := make(map[int64]indexHelperInfo)
for _, index := range t.indices {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the clustered primary index be skipped here? If so we could rename it checkSecondaryIndexKeys

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be skipped.
I think there are still cases where the (non-clustered) primary indices need to be checked?

Comment on lines 101 to 138
if len(m.value) == 0 && NeedRestoredData(indexHelperInfo.indexInfo.Columns, t.Meta().Columns) {
continue
}

decodedIndexValues, err := tablecodec.DecodeIndexKV(m.key, m.value, len(indexHelperInfo.indexInfo.Columns),
tablecodec.HandleNotNeeded, indexHelperInfo.rowColInfos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will index key utilities like expression index, collations or row formats introduce corner cases here?
Need help /cc @lysu @wjhuang2016

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it's fine here.

@cfzjywxk cfzjywxk added require-LGT3 Indicates that the PR requires three LGTM. sig/transaction SIG:Transaction labels Sep 9, 2021
Comment on lines 101 to 138
if len(m.value) == 0 && NeedRestoredData(indexHelperInfo.indexInfo.Columns, t.Meta().Columns) {
continue
}

decodedIndexValues, err := tablecodec.DecodeIndexKV(m.key, m.value, len(indexHelperInfo.indexInfo.Columns),
tablecodec.HandleNotNeeded, indexHelperInfo.rowColInfos)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it's fine here.

}
mutations := collectTableMutationsFromBufferStage(t, memBuffer, sh)
if err := checkRowAdditionConsistency(sessVars, t.Meta().Columns, dataAdded, mutations); err != nil {
return errors.Trace(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use Trace if the err is generated by errors.New

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors are temporary. They might change after #27388 is merged

@ekexium
Copy link
Contributor Author

ekexium commented Sep 14, 2021

/run-all-tests

Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the comments seems outdated... but please still take a look

table/tables/mutation_checker.go Outdated Show resolved Hide resolved
}

func collectTableMutationsFromBufferStage(t *TableCommon, memBuffer kv.MemBuffer, sh kv.StagingHandle) []mutation {
mutations := make([]mutation, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible to make the membuffer support getting the size of the current stage, so that we can reserve enough space and allocate exactly once. But the change might be too much for a single PR.

table/tables/mutation_checker.go Outdated Show resolved Hide resolved
table/tables/mutation_checker.go Outdated Show resolved Hide resolved
table/tables/mutation_checker.go Outdated Show resolved Hide resolved
Comment on lines 155 to 158
columnMap := make(map[int64]*model.ColumnInfo)
for _, col := range tableColumns {
columnMap[col.ID] = col
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks too expensive if we do this for each row... If this is really necessary, can we try to store it somewhere like the sessionctx, to make it reusable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to save it in the stmtctx. PTAL if it's reasonable

Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>
Signed-off-by: ekexium <ekexium@gmail.com>

Auto stash before rebase of "ft-data-inconsistency"
Signed-off-by: ekexium <ekexium@gmail.com>
Some implementations of MemBuffer doesn't support staging. We don't care about them for now
@ekexium
Copy link
Contributor Author

ekexium commented Sep 15, 2021

/run-all-tests

1 similar comment
@ekexium
Copy link
Contributor Author

ekexium commented Sep 15, 2021

/run-all-tests

if rowInsertion.key == nil {
rowInsertion = m
} else {
err = errors.Errorf("multiple row mutations added/mutated, one = %+v, another = %+v", rowInsertion, m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful that the data may need to be redacted, if you are going to print the error to log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors will be replaced with the reporter (in the following PR, I suppose). I think we can handle redactions there.

Comment on lines 181 to 184
columnFieldMap := make(map[int64]*types.FieldType)
for id, col := range columnMap {
columnFieldMap[id] = &col.FieldType
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another one. How about add a new version of DecodeRowToDatumMap that accepts the ColumnInfo map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the small allocations doesn't have much direct affect to the performance, but it increases the GC pressure, so that allocations on such very-frequent paths should be very carefully treated IMO 🤔
And when map or slice is necessary, consider specifying the capacity in the make statement if possible, to reduce the potential reallocation when expanding. For example, here len(columnMap) can be used as the initial capacity of columnFieldmap, if this new map is really needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..actually I'm afraid if I'm overdoing... or may we do these kind of optimizations in a new PR...? @cfzjywxk how do you think...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add a new version of DecodeRowToDatumMap that accepts the ColumnInfo map?

I think it would be less maintainable.

table/tables/mutation_checker.go Outdated Show resolved Hide resolved
table/tables/mutation_checker.go Show resolved Hide resolved
table/tables/mutation_checker.go Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 21, 2021
@ekexium
Copy link
Contributor Author

ekexium commented Sep 22, 2021

/run-check_dev

Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cfzjywxk PTAL, do you think the unit tests are enough now?

table/tables/mutation_checker_test.go Outdated Show resolved Hide resolved
table/tables/mutation_checker_test.go Show resolved Hide resolved
@cfzjywxk
Copy link
Contributor

@cfzjywxk PTAL, do you think the unit tests are enough now?

We may need to design a specific mechanism to cover more combinations of columns and indexes, which seems not easy to be done in the unit-test. The effectiveness and correctness tests need more detailed planning, and they could be considered together.

For example to test the clustered index, we've added something like https://github.com/pingcap/automated-tests/blob/master/ticases/clustered_index/dml/basic_generator.go to generate combinations.

@cfzjywxk
Copy link
Contributor

As it's in the development branch, maybe we could merge this first and start to make the detailed test plan ? @MyonKeminta @ekexium What do you think?

@MyonKeminta
Copy link
Contributor

🤔 I'm fine with it

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 22, 2021
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 22, 2021
@cfzjywxk cfzjywxk merged commit 2bece44 into pingcap:ft-data-inconsistency Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. require-LGT3 Indicates that the PR requires three LGTM. sig/transaction SIG:Transaction size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants