Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new podEvictor statistics #648

Closed

Conversation

pravarag
Copy link
Contributor

Fixes #503

As part of this change, we are adding newer metrics for improving podEvictor statistics. Currently there is only one metric which is being calculated under: evictions.go

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 17, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @pravarag. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 17, 2021
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 17, 2021
@a7i
Copy link
Contributor

a7i commented Oct 19, 2021

🥇

It would be nice to have more insights around these metrics. What are your (and others) thoughts around including the following info?

  1. Include namespace

I think this should be easy to do and I would think this is something that most users would care about. PodsEvicted metric already does this.

  1. Include all ownerReferences[].name

I don't have a use-case around this, just an idea.

  1. Inject custom labels into the metrics.

This one is tricky but my company has standards around labels (e.g. project= team=), it would be nice to add those in the DeschedulerPolicy somehow and inject them as "custom" labels in the metrics. I realize this is outside the scope of this PR so I can take this on after this PR is merged.

  1. this conflicts with option 3 but another option would be to include recommended labels

I personally prefer option 3 (i.e custom labels) but just an idea

@pravarag
Copy link
Contributor Author

Thanks @a7i for sharing above comments. I've few doubts though and again for this change we don't have anything definite on paper so I'll post these here for now,

Inject custom labels into the metrics.

Wouldn't this be more user specific? Like, not everyone will be having same custom labels. My understanding could be wrong here but if you have any examples for the same would be helpful 🙂 or does the custom labels refer to labels here?

@a7i
Copy link
Contributor

a7i commented Oct 26, 2021

Thanks @a7i for sharing above comments. I've few doubts though and again for this change we don't have anything definite on paper so I'll post these here for now,

Inject custom labels into the metrics.

Wouldn't this be more user specific? Like, not everyone will be having same custom labels. My understanding could be wrong here but if you have any examples for the same would be helpful 🙂 or does the custom labels refer to labels here?

Sorry if I was not clear. This proposal will be done outside of this PR. Once this PR is merged, I will create a Feature with my proposal/ideas around it.

@pravarag
Copy link
Contributor Author

@a7i thanks for the information shared above. Please feel free to suggest any changes I can implement as part of this PR as well.
@damemi @ingvagabund kindly review and let me know if I can update or make any further changes :)

@a7i
Copy link
Contributor

a7i commented Nov 15, 2021

@a7i thanks for the information shared above. Please feel free to suggest any changes I can implement as part of this PR as well. @damemi @ingvagabund kindly review and let me know if I can update or make any further changes :)

I would really like to see namespace included in the metrics. PodsEvicted metric already does this.

Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test
@pravarag could you please move the PR out of draft if it's ready for review and remove the WIP from the title?

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 15, 2021
@pravarag pravarag marked this pull request as ready for review November 16, 2021 02:50
@pravarag pravarag changed the title [WIP] add new podEvictor statistics Add new podEvictor statistics Nov 17, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 17, 2021
StabilityLevel: metrics.ALPHA,
}, []string{"result"})

TotalPodsSkipped = metrics.NewCounterVec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there anywhere you're updating TotalPodsSkipped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to figure out as where I can update TotalPodsSkipped if it could be below this line and this line. If you could suggest me anything here, will be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 29, 2021
@@ -99,6 +99,7 @@ func (pe *PodEvictor) TotalEvicted() int {
for _, count := range pe.nodepodCount {
total += count
}
metrics.TotalPodsEvicted.With(map[string]string{"result": "total pods evicted so far"}).Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the metric being incremented here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for incrementing the metrics here is to capture the total pods evicted as per the count goes up in line 100. I thought of keeping it align with the evicted pod count going up in evictions.go so that it might be easier to record the total count. Let me know your thoughts otherwise I'll make some changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It seems odd that it's being incremented in a method that is only supposed to get total evicted count.

// TotalEvicted gives a number of pods evicted through all nodes

Copy link
Contributor Author

@pravarag pravarag Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this as well to this line and have included namespace as a field under all the metrics.

@@ -112,14 +113,16 @@ func (pe *PodEvictor) EvictPod(ctx context.Context, pod *v1.Pod, node *v1.Node,
}
if pe.maxPodsToEvictPerNode > 0 && pe.nodepodCount[node]+1 > pe.maxPodsToEvictPerNode {
metrics.PodsEvicted.With(map[string]string{"result": "maximum number reached", "strategy": strategy, "namespace": pod.Namespace}).Inc()
return false, fmt.Errorf("Maximum number %v of evicted pods per %q node reached", pe.maxPodsToEvictPerNode, node.Name)
metrics.PodsEvictedSuccess.With(map[string]string{"strategy": strategy}).Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is PodsEvictedSuccess being incremented when maxPodsToEvictPerNode has been reached? Should this be TotalPodsSkipped?

Copy link
Contributor Author

@pravarag pravarag Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for identifying this, I'll move PodsEvictedSuccess to a different and more appropriate place. I might have misinterpreted it's usage here.

Edit: @a7i I've moved PodsEvictedSuccess to this line 142. Let me know if this looks correct. Also do you have any suggestions in ways I can test this?

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 11, 2021
@pravarag pravarag force-pushed the podevictor-statistics branch 2 times, most recently from 55ef3c4 to 8b5bb6a Compare December 12, 2021 14:36
@pravarag
Copy link
Contributor Author

@a7i @damemi I've one doubt regarding the utilization of metric TotalPodsSkipped. So for it's use-cases, I could think of two possible situations when we might want to skip a pod and then increment the respective metrics for it:

  1. I think in these two cases when we have reached maximum number of pods per Node, Namespace: L120-125
  2. After this line if there is any error faced while trying to evict a pod.

Any suggestions on where else I can utilize TotalPodsSkipped metrics would be helpful.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 20, 2022
@damemi
Copy link
Contributor

damemi commented Jan 20, 2022

@pravarag I think this looks good, just needs a rebase
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damemi, pravarag

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 21, 2022
@pravarag
Copy link
Contributor Author

Thanks @damemi , I've rebased the branch as suggested.

Copy link
Contributor

@ingvagabund ingvagabund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what is benefit of the new TotalPodsSkipped, PodsEvictedSuccess and PodsEvictedFailed metrics besides simpler expressions when querying the metrics? Currently they are subset of what is already provided by PodsEvicted.

When running the descheduler with metrics enabled the PodsEvicted metric gets continuously incremented over time with pods getting regularly evicted. In order to provide a number of pods evicted/skipped/... in a single run the pod evictor needs to be signaled a new run/descheduling cycle started.

if pe.metricsEnabled {
metrics.PodsEvicted.With(map[string]string{"result": "maximum number of pods per node reached", "strategy": strategy, "namespace": pod.Namespace, "node": node.Name}).Inc()
}
metrics.PodsEvicted.With(map[string]string{"result": "maximum number of pods per node reached", "strategy": strategy, "namespace": pod.Namespace, "node": node.Name}).Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind putting all the places where your new metrics are populated under the same condition? The PodEvictor was recently extended with pe.metricsEnabled condition to populate the metrics only when the metrics server is running.

&metrics.CounterOpts{
Subsystem: DeschedulerSubsystem,
Name: "total_pods_skipped",
Help: "Total pods skipped for a single run",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what a single run means in this context? Is it meant as a single descheduling cycle? TotalPodsSkipped metric is used only inside PodEvictor where there's currently no way to distinguish when a single run finished and a new one started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could introduce a copy of these metrics named something like PodsSkippedLastRun, for example. It would require some refactoring to the PodEvictor, maybe pass it a timestamp at the start of a new run, and if that timestamp differs from the last one it remembers then reset the counters for those metrics. But I think knowing the totals vs individual runs would be helpful.

I suppose you could get the total by just summing the individual runs, in which case we don't need the Total.. metrics anymore. Either way, I think this is a good start and the refactors could come as a follow up. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With single run I meant a single descheduling cycle (somewhat mentioned here in this comment: #503 (comment)). So just wanted to check will it still be a good idea to introduce this metric or not @damemi @ingvagabund

@@ -184,6 +184,7 @@ func evictPod(ctx context.Context, client clientset.Interface, pod *v1.Pod, poli
err := client.PolicyV1beta1().Evictions(eviction.Namespace).Evict(ctx, eviction)

if apierrors.IsTooManyRequests(err) {
metrics.TotalPodsSkipped.With(map[string]string{"result": "total pods skipped so far", "namespace": pod.Namespace}).Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the result to total pods skipped so far is redundant here given the metric's name is TotalPodsSkipped. I wonder if it would make more sense to set the result to pod skipped due to TooManyRequests error?

return false, nil
}

metrics.PodsEvictedSuccess.With(map[string]string{"strategy": strategy, "namespace": pod.Namespace}).Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodsEvictedSuccess is a special case of PodsEvicted (with result label set to success and node label ignored).

metrics.PodsEvicted.With(map[string]string{"result": "error", "strategy": strategy, "namespace": pod.Namespace, "node": node.Name}).Inc()
}
metrics.PodsEvicted.With(map[string]string{"result": "error", "strategy": strategy, "namespace": pod.Namespace, "node": node.Name}).Inc()
metrics.PodsEvictedFailed.With(map[string]string{"strategy": strategy, "namespace": pod.Namespace}).Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodsEvictedFailed is a special case of PodsEvicted (with result label set to error and node label ignored). I wonder what is benefit of creating this metric compared to PodsEvictedFailed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was again inclined towards capturing Failed metrics for a single run. Are you suggesting PodsEvictedFailed as compared to PodsEvicted metric alone?

metrics.PodsEvicted.With(map[string]string{"result": "maximum number of pods per namespace reached", "strategy": strategy, "namespace": pod.Namespace, "node": node.Name}).Inc()
}
metrics.PodsEvicted.With(map[string]string{"result": "maximum number of pods per namespace reached", "strategy": strategy, "namespace": pod.Namespace, "node": node.Name}).Inc()
metrics.TotalPodsSkipped.With(map[string]string{"result": "total pods skipped so far", "namespace": pod.Namespace}).Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TotalPodsSkipped called here is a special case of PodsEvicted (with result label set to maximum number of pods per namespace reached and node label ignored).

return false, fmt.Errorf("Maximum number %v of evicted pods per %q namespace reached", *pe.maxPodsToEvictPerNamespace, pod.Namespace)
}

err := evictPod(ctx, pe.client, pod, pe.policyGroupVersion)
// increment TotalPodsEvicted
metrics.TotalPodsEvicted.With(map[string]string{"result": "total pods evicted so far", "namespace": pod.Namespace}).Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point it is unknown if a pod was evicted or not. Thus, incrementing the TotalPodsEvicted does not reflect the actual pod eviction. The metric currently captures how many times the evictPod function was invoked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'll update this part. Thanks!

@ingvagabund
Copy link
Contributor

ingvagabund commented Jan 21, 2022

Within each descheduling cycle a new pod evictor instance is created. Which eliminates any way of telling the pod evictor "a run has ended". The PodsEvicted metric is shared across cycles so there's no distinction of which increments belongs to which descheduling cycle. On the other hand when the descheduling cycle is set to e.g. 1h there's no need for such distinction as one can either see the steps in a graph or use delta{...[1h]} expression to see the increments per cycle.

@jklaw90
Copy link
Contributor

jklaw90 commented Feb 2, 2022

@pravarag hey, i was looking into similar metrics around evictions. are you still planning on finishing this pr up or do you need any help?

@pravarag
Copy link
Contributor Author

pravarag commented Feb 2, 2022

Hey @jklaw90 , I'm working on addressing the latest review comments for this PR. If you have any suggestions/comments, please feel free to share on this PR and I'll definitely check those 🙂 .

@pravarag
Copy link
Contributor Author

pravarag commented Feb 2, 2022

Within each descheduling cycle a new pod evictor instance is created. Which eliminates any way of telling the pod evictor "a run has ended". The PodsEvicted metric is shared across cycles so there's no distinction of which increments belongs to which descheduling cycle. On the other hand when the descheduling cycle is set to e.g. 1h there's no need for such distinction as one can either see the steps in a graph or use delta{...[1h]} expression to see the increments per cycle.

@ingvagabund @damemi thanks for the detailed review. So few questions which I have now:

  1. Do we still want to capture metrics for a single run?
  2. Are there any other metrics that will need to be added or removed as part of this PR?

@pravarag
Copy link
Contributor Author

@damemi @ingvagabund are we planning to merge these changes as part of release 1.24. If so, I would like you suggestions in moving this forward with all the changes required :)

@damemi
Copy link
Contributor

damemi commented Feb 23, 2022

@pravarag I would like to merge this, and sorry it's taken so long.

I think the only thing still being discussed was the fact that this is reporting cumulative pod evictions rather than single-run as implied. @ingvagabund is that all that was left to sort out? I wonder if maybe a different metric type, like a histogram, could solve this and provide easier access to single-run metrics

@pravarag
Copy link
Contributor Author

@pravarag I would like to merge this, and sorry it's taken so long.

I think the only thing still being discussed was the fact that this is reporting cumulative pod evictions rather than single-run as implied. @ingvagabund is that all that was left to sort out? I wonder if maybe a different metric type, like a histogram, could solve this and provide easier access to single-run metrics

I can give a try by implementing histogram type metric for single-run of pod.

@ingvagabund
Copy link
Contributor

I wonder if we still need to capture the per single-run metrics given we can use delta{...[1h]} query in the Prometheus UI. I wonder if there's a metric type which would allow to capture sequences in time. A histogram is another cumulative metric type. It will not help here.

@pravarag I am sorry. I don't think there's anything else left to do for the moment wrt. capturing metrics for a single run. Rather going back to the drawing board and identifying new metrics which we might add to the code.

@pravarag
Copy link
Contributor Author

Thanks @ingvagabund @damemi for your reviews on this. And like mentioned above, that we may not need to capture metrics for a single run I guess, I can close this PR then? and since I've already put some effort around it, I still don't want to leave it unfinished as this has come out to be a good learning for me as well. Kindly let me know in ways we can rethink on implementing newer metrics as part of this issue? I'm definitely open for a discussion around it be it on this PR itself or maybe going back to the original issue :)

@damemi
Copy link
Contributor

damemi commented Apr 5, 2022

@pravarag sorry about this. it sounds like we've come back around to not needing single-run metrics. I think per-strategy metrics could be a good option though. what do you think?

@pravarag
Copy link
Contributor Author

pravarag commented Apr 7, 2022

@pravarag sorry about this. it sounds like we've come back around to not needing single-run metrics. I think per-strategy metrics could be a good option though. what do you think?

I think that's a good idea @damemi @ingvagabund and I can work on implementing those. But wanted to check how to approach the same? Do I need to open a Proposal or new discussion around it, mayb a new issue? Or we are good to continue in #503 issue itself?

@pravarag
Copy link
Contributor Author

On the other hand, I was thinking if I could just close this PR and open a fresh PR with new (per strategy) changes 🤔 and we can continue in that same issue.

@ingvagabund
Copy link
Contributor

The new descheduling framework will have more options for introducing new metrics.

@pravarag
Copy link
Contributor Author

@ingvagabund by options, do you mean the newer metrics will be added as part of new descheculing framework ? If so, then we can probably close this PR and issue related to it?

@ingvagabund
Copy link
Contributor

Opening a new PR sounds reasonable so we can start fresh. The issue can stay open though.

@pravarag
Copy link
Contributor Author

Closing this PR based on above discussions, will open a fresh one based on newer changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve podEvictor statistics
6 participants