Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedule: add check action when poll the opeators from opNotifierQueue #8010

Merged
merged 6 commits into from
Apr 3, 2024

Conversation

TonsnakeLin
Copy link
Contributor

What problem does this PR solve?

Issue Number: Close #7992

What is changed and how does it work?

It add light check for polling the opeators from Controller::opNotifierQueue. 
For example, check whether the region has disappeared,check whether the version of region epoch has changed for merge-region operator.
It has the chance to cancel the operators from Controller::opNotifierQueue execept timeout. We can only cancel the operator by timeout mechanism before.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Signed-off-by: TonsnakeLin <michael.p@pingcap.com.cn>
Copy link
Contributor

ti-chi-bot bot commented Apr 1, 2024

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • disksing
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. labels Apr 1, 2024
Copy link
Contributor

ti-chi-bot bot commented Apr 1, 2024

Hi @TonsnakeLin. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot requested review from lhy1024 and rleungx April 1, 2024 06:43
@ti-chi-bot ti-chi-bot bot added needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 1, 2024
Signed-off-by: TonsnakeLin <michael.p@pingcap.com.cn>
@TonsnakeLin
Copy link
Contributor Author

/cc @disksing @Connor1996

@TonsnakeLin
Copy link
Contributor Author

/test ?

Copy link
Contributor

ti-chi-bot bot commented Apr 1, 2024

@TonsnakeLin: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 1, 2024
pkg/schedule/operator/operator_controller.go Outdated Show resolved Hide resolved
// But to be cautions, it only takes effect on merge-region currently.
// If the version of epoch is changed, the region has been splitted or merged, and the key range has been changed.
// The changing for conf_version of epoch doesn't modify the region key range, skipt it.
if (op.Kind()&OpMerge != 0) && region.GetRegionEpoch().GetVersion() != op.RegionEpoch().GetVersion() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to compare the epoch like checkStaleOperator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkStaleOperator only check the ConfVersion of region epoch which maybe not resolve the issue #7992. For example, "operator add split-region 258 --policy="approximate" will change the version of region 258, but not for ConfVersion.
Another reason is I don't know why limits the checkStaleOperator only to DispatchFromHeartBeat source https://github.com/tikv/pd/blob/master/pkg/schedule/operator/operator_controller.go#L120

Copy link
Member

@rleungx rleungx Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, is it enough to only check the version? Or we also need to check both the version and conf version?

Copy link
Contributor Author

@TonsnakeLin TonsnakeLin Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At begining, I checked both the version and conf-version. But I think the conf-version changed doesn't mean the key-range changed, the merge operator can still be executed sucessfully. At last I check only the version of RegionEpoch.
um... I'm a little not sure whether to consider conf-version.
Could you give me some suggestion ? @disksing @rleungx , Thank you!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think there will be a problem if we use != since the operator's epoch won't change after creating.

@rleungx
Copy link
Member

rleungx commented Apr 1, 2024

/ok-to-test

@ti-chi-bot ti-chi-bot bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Apr 1, 2024
Signed-off-by: TonsnakeLin <michael.p@pingcap.com.cn>
Copy link

codecov bot commented Apr 1, 2024

Codecov Report

Merging #8010 (00cfdaf) into master (6fe44d7) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

❗ Current head 00cfdaf differs from pull request most recent head 6da6e25. Consider uploading reports for the commit 6da6e25 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8010      +/-   ##
==========================================
- Coverage   73.60%   73.57%   -0.03%     
==========================================
  Files         436      436              
  Lines       48516    48512       -4     
==========================================
- Hits        35709    35694      -15     
- Misses       9743     9755      +12     
+ Partials     3064     3063       -1     
Flag Coverage Δ
unittests 73.57% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

TonsnakeLin added 2 commits April 1, 2024 21:57
Signed-off-by: TonsnakeLin <michael.p@pingcap.com.cn>
Signed-off-by: TonsnakeLin <michael.p@pingcap.com.cn>
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Apr 2, 2024
@rleungx
Copy link
Member

rleungx commented Apr 3, 2024

/merge

Copy link
Contributor

ti-chi-bot bot commented Apr 3, 2024

@rleungx: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Apr 3, 2024

This pull request has been accepted and is ready to merge.

Commit hash: 00cfdaf

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Apr 3, 2024
@ti-chi-bot ti-chi-bot bot merged commit a2b0e3c into tikv:master Apr 3, 2024
22 checks passed
@ti-chi-bot ti-chi-bot added the needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. label Apr 15, 2024
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Apr 15, 2024
close tikv#7992

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #8066.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request could not be created: failed to create pull request against tikv/pd#release-7.5 from head ti-chi-bot:cherry-pick-8010-to-release-7.5: the GitHub API request returns a 403 error: {"message":"You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later. If you reach out to GitHub Support for help, please include the request ID 98EE:21F144:5006CCD:80D2AFD:6639C15F and timestamp 2024-05-07 05:51:28 UTC.","documentation_url":"https://docs.github.com/rest/overview/rate-limits-for-the-rest-api#about-secondary-rate-limits"}

@ti-chi-bot ti-chi-bot added the needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. label May 13, 2024
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request May 13, 2024
close tikv#7992

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #8170.

@TonsnakeLin
Copy link
Contributor Author

/cherry-pick release-7.5

@ti-chi-bot
Copy link
Member

@TonsnakeLin: new pull request created to branch release-7.5: #8179.

In response to this:

/cherry-pick release-7.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot bot pushed a commit that referenced this pull request May 20, 2024
#8010) (#8179)

close #7992

Signed-off-by: TonsnakeLin <michael.p@pingcap.com.cn>

Co-authored-by: TonsnakeLin <michael.p@pingcap.com.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

placement-rule: it can't transfer leader to the target store
4 participants