statistics: ease the impact of stats feedback on cluster (#15503) #18769
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cherry-pick #15503 to release-2.1
What problem does this PR solve?
Fix #17478
Problem Summary:
Statistics feedback would impose periodical read/write burden on the database. Each TiDB would dump the feedbacks collected on this instance into TiKV every 10 mins, and the stats owner TiDB instance would read the feedbacks dumped every 15 seconds. If the stats owner TiDB finds there are new feedbacks from TiKV, it would merge the feedbacks with statistics in cache, then dump all these updated statistics into TiKV. This dump operation is pretty heavy if there are bunches of feedbacks on bunches of columns/indexes, since it can be treated as a light-weight ANALYZE on a lot of tables.
What is changed and how it works?
What's Changed:
First, reduce the amount of feedbacks generated on each TiDB by:
MaxQueryFeedbackCount
;Second, merge multiple insert/update/delete statements of dumping statistics into single ones, to reduce the unnecessary function call stacks and RPCs.How it Works:
Obviously, the first change can flow-control the statistics feedback mechanism fundamentally, but we may lose some stats accuracy incurred by feedback theoretically.
The second change combines several small transactions into a big one, since we have controlled the amount of feedbacks using the first change, I guess this bigger transaction is supposed not to be a problem. However, the bigger transaction should have higher chances of write conflict, and it makes code harder to read, so I haven't made up my mind to keep it or not actually.Related changes
Check List
Tests
Side effects
Release note