Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EvictOptions struct to EvictPod() #885

Merged
merged 1 commit into from
Jul 20, 2022

Conversation

damemi
Copy link
Contributor

@damemi damemi commented Jul 11, 2022

Following up from #846 (comment), this proposes adding an EvictOptions struct, which can be used to optionally pass additional information to EvictPod().

Since EvictPod() is the public wrapper around the internal evictPod function which actually does the eviction, I think it makes sense to do logging/metrics in this function. An options struct allows us to give users the option to pass this info in a structured format, while abstracting that format enough to maintain a consistent function signature. So, this could be expanded in the future to allow more customization around EvictPod as well.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 11, 2022
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 11, 2022
@damemi damemi requested review from a7i and lixiang233 and removed request for lixiang233 July 11, 2022 15:07
@a7i
Copy link
Contributor

a7i commented Jul 11, 2022

/ok-to-test
/lgtm
/assign @ingvagabund

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jul 11, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 11, 2022
Copy link
Member

@JaneLiuL JaneLiuL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very clear~~
/lgtm

}

// Eviction reason can be set through the EvictOptions struct
func (pe *PodEvictor) EvictPod(ctx context.Context, pod *v1.Pod, opts EvictOptions) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me the client-go code function signatures:

Create(ctx context.Context, pod *v1.Pod, opts metav1.CreateOptions) (*v1.Pod, error)

+1 for passing a struct with options into the pod evictor method to pass any eviction control value. Nevertheless, I have still trouble accepting the need for passing the strategy name and the reason. Both are used in the post eviction step. Either for recording a metric (the actual eviction is skipped/ignored due to limits getting exceeded or the eviction errors/succeeds). Or, when the method klogs and/or broadcasts an event.

To clarify more on why we need both the strategy and the reason fields:

  • metrics: We use the strategy name since we are interested in knowing which strategy is responsible for which evictions. We might introduce a metric which will record the reasons if there's an interest.
  • events: This might be actually broken. An event is supposed to report which strategy was used for the eviction. The reason is currently not set. In the past the reason was set to the strategy name. Which I unintentionally broke. The expectation here is to use both the strategy and the reason and broadcast them to an evicted pod so a user knows why and by who a pod was evicted.
  • logs: We want to log which strategy evicted a pod (or why the eviction failed) and why the eviction was needed (if a reason is set)

From the perspective of a plugin developer/user I would like to minimize anything I need to set in order to invoke a pod eviction. I'd like the framework to take care of what needs to be set in order to record a metric or to broadcast an event. I'd be very interested in the logging part so I can log as many information I can in order to properly debug.

Based on that I'd like to minimize the signature which we will expose to plugin developers as part of the public API. Given the framework knows exactly which strategy is currently getting executed, it can take care of setting the right strategy name. A plugin developer is just expected to properly expose a plugin name (e.g. through a public constant when a plugin is registered). The framework will then configure a handle through which it will provide a simple signature. The handle can then take care of recording metrics, broadcasting events and other useful steps. E.g. for tracking or logging.

Saying all of that I do not challenge this particular change. I challenge the fact a strategy should be responsible for passing its name into the "I am responsible for evicting pods" method. My final idea for the framework is to do the following:

  • a plugin invokes a handle.Evictor().Evict(ctx context.Context, pod *v1.Pod, ...) method
  • when the framework is running a particular plugin, it injects the plugin name into the method. E.g. by changing its internal values
  • once a plugin invokes handle.Evictor().Evict(ctx context.Context, pod *v1.Pod, ...), the method will do the following:
    status := internal.PodEvictor.EvictPod(ctx, pod)
    if status.Err != nil {
      klog.ErrorS(err, "Error evicting pod", "pod", klog.KObj(pod))
      return
    }
    metrics.PodsEvicted.With(map[string]string{"result": status.Result, "strategy": internal.strategy, "namespace": pod.Namespace, "node": pod.Spec.NodeName}).Inc()
    klog.V(1).InfoS("Evicted pod", "pod", klog.KObj(pod), "reason", reason, "strategy", internal.strategy, "node", pod.Spec.NodeName)
    r.Event(pod, v1.EventTypeNormal, "Descheduled", fmt.Sprintf("pod evicted by sigs.k8s.io/descheduler%s", reason))
    

Based on my PoC, PodEvictor.EvictPod will be mostly inlined into handle.Evictor().Evict. The checks for the limits will be moved higher up as well. E.g. https://github.com/kubernetes-sigs/descheduler/pull/781/files#diff-67816189b07b98460b57a10b41280403c53a4dfda42fdd8f54475812cb99f56eR185-R199. All depends on whether we decide to follow this path. Or, if we decide on a different approach.

On the other hand a reason is quite specific to each plugin. Each plugin can have a different reasons for eviction depending on circumstances. E.g. "pod lifetime exceeded, too many restarts, node affinity violated, ...". Right now the reason is optimal. Making it required might force plugin developer to think more carefully about why a pod needs to be evicted. I was hoping to reuse a context for passing the reason. Since a pod is evicted in a certain context (by a strategy, from a given node, ...) and a reason further narrows down the context in which a pod is evicted.

Apologies, I do not have any definite answer. Just some thoughts to consider before we move on. @damemi @a7i I wonder what is your though on this approach. Passing a reason through a context does not feel quite right. Though passing it through the EvictOptions struct which I interpret as control options does not feel quite right either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@knelasevero knelasevero Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for passing a struct with options into the pod evictor method to pass any eviction control value. Nevertheless, I have still trouble accepting the need for passing the strategy name and the reason. Both are used in the post eviction step. Either for recording a metric (the actual eviction is skipped/ignored due to limits getting exceeded or the eviction errors/succeeds). Or, when the method klogs and/or broadcasts an event.

I know the main discussion now is about name and reason for evictions, but I would like to know what do you think are suitable control values that should in fact be passed in EvictOptions? Would it be for some control logic that we don't have yet?

(sorry for the tangent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me the client-go code function signatures:

that was my inspiration for this. I figured the familiarity with that pattern, and users being accustomed to sending empty objects, would be a good point to look at this from

Given the framework knows exactly which strategy is currently getting executed, it can take care of setting the right strategy name. A plugin developer is just expected to properly expose a plugin name (e.g. through a public constant when a plugin is registered).

I agree with this point. Strategy name is probably unnecessary, and it should be exposed through the plugin code that the framework reads from. I think an interface with a Name() function is better for enforcing this.

On the other hand a reason is quite specific to each plugin. Each plugin can have a different reasons for eviction depending on circumstances. E.g. "pod lifetime exceeded, too many restarts, node affinity violated, ...".

This is the key reason for being able to pass a variable at eviction time. Not only could custom plugins have many reasons for eviction, they could have many different calls to Evict() (maybe not the best design, but that's up to the developer)

Right now the reason is optimal. Making it required might force plugin developer to think more carefully about why a pod needs to be evicted. I was hoping to reuse a context for passing the reason. Since a pod is evicted in a certain context (by a strategy, from a given node, ...) and a reason further narrows down the context in which a pod is evicted.

I don't think it needs to be required, since the strategy name could be descriptive enough. But either way, a context doesn't enforce any structure and errors or missing parameters will only be found at runtime. Context is intended to pass information across requests, which isn't what we're doing here.

Passing a reason through a context does not feel quite right. Though passing it through the EvictOptions struct which I interpret as control options does not feel quite right either.

The name "EvictOptions" isn't my favorite either, for these use cases I think they are passing more like eviction metadata. But the idea is that this could expand in the future to expose more knobs.

I would like to know what do you think are suitable control values that should in fact be passed in EvictOptions?

This is a great question, because the PodEvictor is already customizable with options functions. However, the difference is between compile-time options and runtime options.

A plugin developer will want to instantiate a PodEvictor in their code with certain eviction criteria built in. But when the plugin actually runs, information specific to that single run/eviction may be useful. Right now, it looks like just additional reasons are the only use case for that (ie pod evicted because it didn't tolerate taint Foo). My thinking is, looking ahead, we may want a handle here to expand those runtime options.

Copy link
Contributor

@ingvagabund ingvagabund Jul 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/kubernetes/typed/policy/v1beta1/eviction_expansion.go#L27:

type EvictionExpansion interface {
	Evict(ctx context.Context, eviction *policy.Eviction) error
}

and https://github.com/kubernetes/kubernetes/blob/ff20035ef8d305e5bec3866670b515d0c5c6a1c0/staging/src/k8s.io/api/policy/v1beta1/types.go#L172:

type Eviction struct {
	metav1.TypeMeta `json:",inline"`

	// ObjectMeta describes the pod that is being evicted.
	// +optional
	metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`

	// DeleteOptions may be provided
	// +optional
	DeleteOptions *metav1.DeleteOptions `json:"deleteOptions,omitempty" protobuf:"bytes,2,opt,name=deleteOptions"`
}

We might follow the same pattern here:

type Eviction struct {
	metav1.TypeMeta `json:",inline"`

	// ObjectMeta describes the pod that is being evicted.
	// +optional
	metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`

	// Reason describes cause of the eviction
	Reason string

	// EvictOptions may be provided
	// +optional
	EvictOptions *metav1.EvictOptions `json:"evictOptions,omitempty"`
}

func (pe *PodEvictor) EvictPod(ctx context.Context,  eviction *Eviction) bool

with the following invocation:

eviction := &Eviction{
	TypeMeta: metav1.TypeMeta{
		APIVersion: "descheduler/v1alpha2",
		Kind:       "Eviction",
	},
	ObjectMeta: metav1.ObjectMeta{
		Name:      pod.Name,
		Namespace: pod.Namespace,
	},
	Reason: "Too many pods on a node",
	EvictOptions: evictOptions,
}
handle.Evictor().Evict(ctx, eviction)

We can omit EvictOptions for the moment and introduce it later once needed. We might need TypeMeta for the generators or remove it if not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-implementing the upstream Eviction type seems unnecessarily complex for what we're accomplishing. The tradeoff in your proposal appears to be fewer function arguments, but those arguments are now just embedded in a struct. So the weight is essentially still there.

If we were to go this way, I would rather just use the upstream Eviction type directly, and still have a third parameter for Descheduler framework-specific options. ie:

handle.Evictor().Evict(ctx, *policy.Eviction, descheduler.EvictOptions)

So that the upstream k8s object is decoupled from Descheduler's framework types. What do you think about something like that?


Side note: I'm still not clear on the difference between having an EvictPod() internal function and the handle.Evictor().Evict() functions. Can we not just expose handle.Evictor().EvictPod() instead? Or is the idea that users can implement handle.Evictor().Evict() themselves as an Evictor plugin which calls EvictPod()? Just checking that I understand this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not clear on the difference between having an EvictPod() internal function and the handle.Evictor().Evict() functions. Can we not just expose handle.Evictor().EvictPod() instead? Or is the idea that users can implement handle.Evictor().Evict() themselves as an Evictor plugin which calls EvictPod()? Just checking that I understand this

Apologies for the confusion. handle.Evictor().Evict() is the same as EvictPod(). handle.Evictor().Evict() is the future way of invoking EvictPod().

Copy link
Contributor

@ingvagabund ingvagabund Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tradeoff in your proposal appears to be fewer function arguments, but those arguments are now just embedded in a struct. So the weight is essentially still there.

I am aiming for a stable function signature which can stay unchanged. So plugin developers do not have to extend the signature whenever there's a new non-control option for EvictPod. I still have troubles passing the Reason through EvictOptions as the reason string does not change the way a pod is evicted. The same holds for any other non-controlling options which might prove useful in the future for e.g. logging more information, providing more information in an event object or recorded metrics.

If we were to go this way, I would rather just use the upstream Eviction type directly, and still have a third parameter for Descheduler framework-specific options. So that the upstream k8s object is decoupled from Descheduler's framework types.

From all the options we have discussed so far taking into account:

  • simplicity of the signature
  • decoupling descheduler types from the kubernetes types
  • passing non-control option to the EvictPod

I don't see any simple and transparent solution for separating control and non-control options. To avoid changing the signature with new non-control options while already providing a way of extending control options, it makes sense to me to keep the EvictOptions struct and go with EvictPod(ctx context.Context, pod *v1.Pod, opts EvictOptions) bool. On the other hand it will not fully mirror the client-go methods where the opts type is defining control options only.

Unless there's a better idea of how to properly decouple the control and non-control options while still keeping in mind all that has been discussed here, we can move forward.

@damemi thank You for the discussion. If you could please remove the Strategy string from EvictOptions and keep context.WithValue(ctx, "strategyName", string(name)) as it is. Once all the strategies are migrated, the strategy name can be injected internally through the framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EvictOptions is just a name, the point is to provide a struct for additional arguments that can be expanded arbitrarily while keeping the function signature consistent. If there's concern that reason is not really an "option", then I'm open to other names for the struct. But I do envision the possibility of this struct holding more traditional control options at some point.

Maybe EvictionMeta? Or maybe EvictOptions holds a nested LoggingOptions or ObservabilityInfo struct at some point which controls things like logs/metrics info, as an example.

If you could please remove the Strategy string from EvictOptions and keep context.WithValue(ctx, "strategyName", string(name)) as it is. Once all the strategies are migrated, the strategy name can be injected internally through the framework.

That sounds fine to me as a transitory step for the purpose of unblocking the rest of the framework work. It seems like we are in agreement that it's not the final design (please correct me if I'm wrong), so I'm happy to compromise on that.

I'm sorry I have held up the progress on this step, but I appreciate the open discussion. Thanks @ingvagabund! I'll make these changes and ping you when it's ready for another look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we are in agreement that it's not the final design

We will have some time to think about it before the first iteration of the framework is finished and the design is stabilized.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 20, 2022
@ingvagabund
Copy link
Contributor

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 20, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ingvagabund

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2022
@k8s-ci-robot k8s-ci-robot merged commit c699dd1 into kubernetes-sigs:master Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants