Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: add global variable tidb_schema_version_cache_limit to control infoschema cache size #46558

Merged
merged 10 commits into from
Sep 8, 2023

Conversation

crazycs520
Copy link
Contributor

@crazycs520 crazycs520 commented Aug 31, 2023

What problem does this PR solve?

Issue Number: close #46524

Problem Summary: add global variable tidb_schema_version_cache_limit to control infoschema cache size and add/refine metric.

What is changed and how it works?

To fix #46524, we can simply enlarge the info-scheme cache capacity from 16 to 64. However, in scenarios with a large number of tables, this increases the risk of TiDB OOM.

So this PR just add global variable tidb_schema_version_cache_limit to control infoschema cache size and refine metric in grafana to better diagnose this issue later. If you meet issue in #46524 with same root cause, you can enlarge the global variable tidb_schema_version_cache_limit to fix it.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  1. Deploy a TiDB cluster.
  2. run stale read load.
sysbench --config-file=sysbench.conf mussel_minimal.lua --threads=20 --read_staleness=-5 run
  1. run DDL load by loadgen. Following load will use 1 thread to continuously create tables, from t1 to t100000.
loadgen bench --sql 'create table t#seq-val (id int key, b int)' --thread=1 --valmin=1 --valmax=100000 --db=test2
  1. check the metrics.
image
  • 17:12: Start running stale-read load, and the query QPS is 18000.
  • 17:14: Start running DDL load, then the query QPS is dropping to 400. From the Load Schema OPS metrics, you can see many load snapshot schema OPS, this is what causes QPS to drop.
  • 17:18: set @@global. tidb_schema_version_cache_limit=64;, then the query QPS return to 18000.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • N/A

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

… in grafana

Signed-off-by: crazycs520 <crazycs520@gmail.com>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 31, 2023
@tiprow
Copy link

tiprow bot commented Aug 31, 2023

Hi @crazycs520. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hawkingrei
Copy link
Member

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Aug 31, 2023
@hawkingrei
Copy link
Member

@crazycs520 Can you add the monitoring screenshots to PR description?

@codecov
Copy link

codecov bot commented Aug 31, 2023

Codecov Report

Merging #46558 (c8f5b20) into master (fa0c5ce) will decrease coverage by 0.6223%.
The diff coverage is 100.0000%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #46558        +/-   ##
================================================
- Coverage   73.3027%   72.6804%   -0.6224%     
================================================
  Files          1322       1343        +21     
  Lines        396235     402539      +6304     
================================================
+ Hits         290451     292567      +2116     
- Misses        87261      91413      +4152     
- Partials      18523      18559        +36     
Flag Coverage Δ
integration 27.7664% <54.5454%> (?)
unit 73.3000% <100.0000%> (-0.0027%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 54.0444% <ø> (ø)
parser 84.9609% <ø> (ø)
br 48.2621% <ø> (-4.3561%) ⬇️
📢 Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap).

Signed-off-by: crazycs520 <crazycs520@gmail.com>
@crazycs520 crazycs520 requested a review from xhebox August 31, 2023 09:28
@crazycs520
Copy link
Contributor Author

@xhebox @tangenta PTAL

Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 1, 2023
… cache size

Signed-off-by: crazycs520 <crazycs520@gmail.com>
@crazycs520 crazycs520 requested a review from a team as a code owner September 1, 2023 08:51
@ti-chi-bot ti-chi-bot bot removed the approved label Sep 1, 2023
@crazycs520 crazycs520 changed the title domain: enlarge infoCache capacity to avoid cache miss and add metric in grafana *: add global variable tidb_info_schema_cache_size to control infoschema cache size and add/refine metric Sep 1, 2023
Signed-off-by: crazycs520 <crazycs520@gmail.com>
@crazycs520
Copy link
Contributor Author

@xhebox @zimulala @cfzjywxk PTAL

Signed-off-by: crazycs520 <crazycs520@gmail.com>
@cfzjywxk
Copy link
Contributor

cfzjywxk commented Sep 1, 2023

Need to discuss with @easonn7 first as there's session variable changes.

@crazycs520 crazycs520 changed the title *: add global variable tidb_info_schema_cache_size to control infoschema cache size and add/refine metric *: add global variable tidb_info_schema_cache_size to control infoschema cache size Sep 5, 2023
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 5, 2023
Copy link
Contributor

@zimulala zimulala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 5, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 5, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-09-01 04:51:26.658617258 +0000 UTC m=+2075451.207633230: ☑️ agreed by xhebox.
  • 2023-09-05 06:52:46.594496967 +0000 UTC m=+2428331.143512953: ☑️ agreed by zimulala.

@crazycs520
Copy link
Contributor Author

/test unit-test

@tiprow
Copy link

tiprow bot commented Sep 5, 2023

@crazycs520: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test tiprow_fast_test

Use /test all to run all jobs.

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: crazycs520 <crazycs520@gmail.com>
@crazycs520 crazycs520 changed the title *: add global variable tidb_info_schema_cache_size to control infoschema cache size *: add global variable tidb_schema_version_cache_limit to control infoschema cache size Sep 6, 2023
@crazycs520
Copy link
Contributor Author

/test tiprow_fast_test

@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 6, 2023

@crazycs520: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test build
  • /test canary-scan-security
  • /test check-dev
  • /test check-dev2
  • /test mysql-test
  • /test pingcap/tidb/canary_ghpr_unit_test
  • /test pull-br-integration-test
  • /test pull-common-test
  • /test pull-e2e-test
  • /test pull-integration-common-test
  • /test pull-integration-copr-test
  • /test pull-integration-ddl-test
  • /test pull-integration-jdbc-test
  • /test pull-integration-mysql-test
  • /test pull-integration-prisma-test
  • /test pull-mysql-connector-test
  • /test pull-sqllogic-test
  • /test pull-tiflash-test
  • /test unit-test

The following commands are available to trigger optional jobs:

  • /test pull-notify-when-compatibility-sections-changed

Use /test all to run the following jobs that were automatically triggered:

  • pingcap/tidb/ghpr_build
  • pingcap/tidb/ghpr_check
  • pingcap/tidb/ghpr_check2
  • pingcap/tidb/ghpr_mysql_test
  • pingcap/tidb/ghpr_unit_test
  • pull-notify-when-compatibility-sections-changed

In response to this:

/test tiprow_fast_test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@easonn7
Copy link

easonn7 commented Sep 6, 2023

/approve

In the future, we need to enhance memory tracing to avoid OOM issues.

@ti-chi-bot ti-chi-bot bot added the approved label Sep 6, 2023
@crazycs520
Copy link
Contributor Author

/test all

@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 7, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfzjywxk, easonn7, xhebox, zimulala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cfzjywxk
Copy link
Contributor

cfzjywxk commented Sep 7, 2023

/merge

@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 7, 2023

@cfzjywxk: We have migrated to builtin LGTM and approve plugins for reviewing.

Please use /approve when you want approve this pull request.

The changes announcement: LGTM plugin changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@crazycs520
Copy link
Contributor Author

/test all

@hawkingrei
Copy link
Member

/retest

1 similar comment
@crazycs520
Copy link
Contributor Author

/retest

@crazycs520
Copy link
Contributor Author

/ok-to-merge

@crazycs520
Copy link
Contributor Author

/test build

@tiprow
Copy link

tiprow bot commented Sep 8, 2023

@crazycs520: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test tiprow_fast_test

Use /test all to run all jobs.

In response to this:

/test build

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot merged commit ca425ea into pingcap:master Sep 8, 2023
5 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stale read QPS is affected by DDL load
6 participants