Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit Restart rate for pd and tikv in admission webhook #1532

Merged
merged 8 commits into from
Jan 15, 2020

Conversation

Yisaer
Copy link
Contributor

@Yisaer Yisaer commented Jan 9, 2020

What problem does this PR solve?

Currently, the restarter would restart the pod as soon as possible which would have risks for tikv and pd pod.

In this request, the admission webhook would check if there are any former tikv or pd pod which is annotated with tidb.pingcap.com/pod-defer-deleting when being received the pod delete request with tidb.pingcap.com/pod-defer-deleting annotation too. The request would be rejected if existed.

Does this PR introduce a user-facing change?:

Limit the restart rate for pd and tikv, only one instance would be restarted at each time

@Yisaer Yisaer added area/webhook Related to webhook enhancement New feature or request labels Jan 9, 2020
@aylei
Copy link
Contributor

aylei commented Jan 9, 2020

please add a release note for this enhancement

@aylei
Copy link
Contributor

aylei commented Jan 9, 2020

Does this need to be cherry-picked to 1.1?

@Yisaer
Copy link
Contributor Author

Yisaer commented Jan 10, 2020

please add a release note for this enhancement

This is a non user-facing change. Is this necessary to add it in release note? @aylei

@Yisaer Yisaer requested a review from cofyc January 15, 2020 03:40
aylei
aylei previously approved these changes Jan 15, 2020
Copy link
Contributor

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please add a descriptive release note

@Yisaer Yisaer requested a review from aylei January 15, 2020 04:24
@Yisaer Yisaer changed the title Add Restarter Limit for pd and tikv in admission webhook Limit Restart rate for pd and tikv in admission webhook Jan 15, 2020
// checkFormerPodRestartStatus whether there are any form pod is going to be restarted
// return true if existed
func checkFormerPodRestartStatus(kubeCli kubernetes.Interface, memberType v1alpha1.MemberType, tc *v1alpha1.TidbCluster, namespace string, ordinal int32, replicas int32) (bool, error) {
for i := replicas - 1; i > ordinal; i-- {
Copy link
Contributor

@cofyc cofyc Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use helper.GetPodOrdinals to get desired ordinals of pods instead

for id := range helper.GetPodOrdinals(tc.Status.TiDB.StatefulSet.Replicas, set) {

because the ordinals may not be consecutive with AdvancedStatefulset

Copy link
Contributor

@cofyc cofyc Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can list all pods of the statefulset? In this case, I think it's simpler because we need to get pod objects from the API server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I'm wrong because we need to check the desired pods and return an error if the pod is not created yet, otherwise, pods can be deleted before the previous replica is recreated.

Copy link
Contributor Author

@Yisaer Yisaer Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the admission pod webhook only considered the pod controlled by apps.StatefulSet for now. Though it would received the deleting request of advancedStatefulset, there are some places still not support AdvancedStatefulset yet.

func getOwnerStatefulSetForTiDBComponent(pod *core.Pod, kubeCli kubernetes.Interface) (*apps.StatefulSet, error) {

I think it should be supported in another issue to take it as one whole problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kind of AdvancedStatefulset is StatefulSet too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only change here is to use helper.GetPodOrdinals to get the list of ordinals instead of for loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the returned set has a method List() to get a sorted slice in asc order, all ordinals <= ordinal can be ignored in the loop

Copy link
Contributor Author

@Yisaer Yisaer Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if advancedStatefulSet is enabled, would the following method in pkg/webhook/pod/util.go get the owner advancedstatefulset ?

func getOwnerStatefulSetForTiDBComponent(pod *core.Pod, kubeCli kubernetes.Interface) (*apps.StatefulSet, error) {
	name := pod.Name
	namespace := pod.Namespace
	var ownerStatefulSetName string
	for _, ownerReference := range pod.OwnerReferences {
		if ownerReference.Kind == "StatefulSet" {
			ownerStatefulSetName = ownerReference.Name
			break
		}
	}
	if len(ownerStatefulSetName) == 0 {
		return nil, fmt.Errorf(failToFindTidbComponentOwnerStatefulset, namespace, name)
	}
	return kubeCli.AppsV1().StatefulSets(namespace).Get(ownerStatefulSetName, meta.GetOptions{})
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will because it does not check the APIGroup (the only different between advanced statefulset with k8s statefulset is the APIGroup)
I'm fine to update later because the admission controller does not support --features flag yet.

pkg/webhook/pod/util.go Outdated Show resolved Hide resolved
Co-Authored-By: Yecheng Fu <cofyc.jackson@gmail.com>
Copy link
Contributor

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Please do not include ordered list mark at the begining of release note. Refer to https://github.com/pingcap/tidb-operator/blob/master/docs/release-note-guide.md for release note language guide.

@sre-bot
Copy link
Contributor

sre-bot commented Jan 15, 2020

cherry pick to release-1.1 in PR #1555

cofyc added a commit that referenced this pull request Jan 15, 2020
* Add restarter limit in webhook

* Update pkg/webhook/pod/util.go

Co-Authored-By: Yecheng Fu <cofyc.jackson@gmail.com>

Co-authored-by: Song Gao <disxiaofei@163.com>
Co-authored-by: Yecheng Fu <cofyc.jackson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/webhook Related to webhook enhancement New feature or request needs-cherry-pick-1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants