Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After performing an online recovery, "halt-scheduling" has been set to true when reloading pd #8095

Closed
mayjiang0203 opened this issue Apr 18, 2024 · 5 comments · Fixed by #8147
Labels
affects-7.5 affects-8.1 report/customer Customers have encountered this bug. severity/major type/bug The issue is confirmed as a bug.

Comments

@mayjiang0203
Copy link

Bug Report

What did you do?

What did you expect to see?

Should be set to false.

What did you see instead?

[2024/04/18 16:16:08.515 +08:00] [INFO] [cluster.go:1093] ["will run cmd"] [cmd:="tiup ctl:v8.1.0-pre pd -u http://pd3-peer.dr-auto-sync-8c12tikv-tps-7567843-1-466:2379 unsafe remove-failed-stores show"]
  {
    "info": "Unsafe recovery Finished",
    "time": "2024-04-18 16:15:42.491",
[2024/04/18 16:16:22.872 +08:00] [INFO] [cmd.go:197] ["Remote command finished"] [cmd="tiup cluster reload tidbcluster -R pd -y"] [exitcode=0] []
[2024/04/18 16:16:24.293 +08:00] [INFO] [pdutil.go:512] ["run pd ctl command"] [pdCmd="tiup ctl:v8.1.0-pre pd -u http://pd3-peer.dr-auto-sync-8c12tikv-tps-7567843-1-466:2379 config show all"]

What version of PD are you using (pd-server -V)?

v8.1.0

[2024/04/18 15:15:22.453 +08:00] [INFO] [workloadnode.run] [util.go:255] ["/tiup/deploy/pd-/bin/pd-server -V"] [workload=pd2]
[2024/04/18 15:15:22.455 +08:00] [INFO] [cmd.go:150] ["Start remote command"] [cmd="/tiup/deploy/pd-
/bin/pd-server -V"] [nodename=pd2]
2024-04-18T15:15:22.455+0800 INFO k8s/client.go:223 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
Release Version: v8.1.0^M
Edition: Community^M
Git Commit Hash: 3ec92bd^M
Git Branch: HEAD^M
UTC Build Time: 2024-04-15 03:59:49^M

@mayjiang0203 mayjiang0203 added the type/bug The issue is confirmed as a bug. label Apr 18, 2024
@mayjiang0203
Copy link
Author

mayjiang0203 commented Apr 18, 2024

/severity major
/label affects-8.1
/label affects-7.1
/label affects-7.5
/remove-label may-affects-7.5
/remove-label may-affects-7.1
/remove-label may-affects-6.5
/remove-label may-affects-6.1
/remove-label may-affects-5.4

Copy link
Contributor

ti-chi-bot bot commented Apr 19, 2024

@mayjiang0203: These labels are not set on the issue: affects-7.5, affects-7.1, affects-6.5, affects-6.1, affects-5.4.

In response to this:

/severity major
/label affects-8.1
/remove-label affects-7.5
/remove-label affects-7.1
/remove-label affects-6.5
/remove-label affects-6.1
/remove-label affects-5.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Apr 23, 2024

@mayjiang0203: These labels are not set on the issue: may-affects-7.5, may-affects-7.1, may-affects-6.5, may-affects-6.1, may-affects-5.4.

In response to this:

/severity major
/label affects-8.1
/label affects-7.1
/label affects-7.5
/remove-label may-affects-7.5
/remove-label may-affects-7.1
/remove-label may-affects-6.5
/remove-label may-affects-6.1
/remove-label may-affects-5.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@mayjiang0203
Copy link
Author

The impact of this bug: Reloading the cluster will become very slow because evicting the leader is not working anymore, and restarting TiKV requires waiting for a 10-minute timeout.
w/a is: reload pd first, then do "config set halt-scheduling false", after that can reload the cluster.

@ti-chi-bot ti-chi-bot bot closed this as completed in #8147 May 8, 2024
ti-chi-bot bot added a commit that referenced this issue May 8, 2024
…8147)

ref #6493, close #8095

Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process.

Signed-off-by: JmPotato <ghzpotato@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot pushed a commit that referenced this issue May 9, 2024
…8147) (#8155)

ref #6493, close #8095

Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process.

Signed-off-by: JmPotato <ghzpotato@gmail.com>

Co-authored-by: JmPotato <ghzpotato@gmail.com>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue May 20, 2024
ref tikv#6493, close tikv#8095

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue May 20, 2024
ref tikv#6493, close tikv#8095

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this issue May 22, 2024
…8147) (#8194)

ref #6493, close #8095

Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process.

Signed-off-by: JmPotato <ghzpotato@gmail.com>

Co-authored-by: JmPotato <ghzpotato@gmail.com>
Co-authored-by: lhy1024 <liuhanyang@pingcap.com>
@seiya-annie
Copy link

/found customer

@ti-chi-bot ti-chi-bot bot added the report/customer Customers have encountered this bug. label Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.5 affects-8.1 report/customer Customers have encountered this bug. severity/major type/bug The issue is confirmed as a bug.
Projects
Development

Successfully merging a pull request may close this issue.

3 participants