Skip to content

Commit

Permalink
fix the bug for workload resource promql
Browse files Browse the repository at this point in the history
  • Loading branch information
qmhu committed Nov 30, 2022
1 parent 2506d2c commit bc60f10
Show file tree
Hide file tree
Showing 6 changed files with 115 additions and 4 deletions.
4 changes: 2 additions & 2 deletions pkg/utils/expression_prom_default.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ import (
// todo: later we change these templates to configurable like prometheus-adapter
const (
// WorkloadCpuUsageExprTemplate is used to query workload cpu usage by promql, param is namespace,workload-name,duration str
WorkloadCpuUsageExprTemplate = `sum(irate(container_cpu_usage_seconds_total{namespace="%s",pod=~"^%s-.*$"}[%s]))`
WorkloadCpuUsageExprTemplate = `sum(irate(container_cpu_usage_seconds_total{namespace="%s",pod=~"^%s-.*$",container!=""}[%s]))`
// WorkloadMemUsageExprTemplate is used to query workload mem usage by promql, param is namespace, workload-name
WorkloadMemUsageExprTemplate = `sum(container_memory_working_set_bytes{namespace="%s",pod=~"^%s-.*$"})`
WorkloadMemUsageExprTemplate = `sum(container_memory_working_set_bytes{namespace="%s",pod=~"^%s-.*$",container!=""})`

// following is node exporter metric for node cpu/memory usage
// NodeCpuUsageExprTemplate is used to query node cpu usage by promql, param is node name which prometheus scrape, duration str
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: "IdleNode Recommendation"
description: "Introduce for IdleNode Recommendation"
weight: 15
---

By scanning the status and utilization of nodes, the idle node recommendation helps users to find idle Kubernetes nodes.

## Motivation

In Kubernetes cluster, some nodes often idle due to such factors as node taint, label selector, low packing rate and low utilization rate, which wastes a lot of costs. IdleNode recommendation tries to help users find these nodes to reduce cost.

## Example

```yaml
kind: Recommendation
apiVersion: analysis.crane.io/v1alpha1
metadata:
name: idlenodes-rule-idlenode-5jxn9
namespace: crane-system
labels:
analysis.crane.io/recommendation-rule-name: idlenodes-rule
analysis.crane.io/recommendation-rule-recommender: IdleNode
analysis.crane.io/recommendation-rule-uid: 8921a198-7082-11ed-8b7b-246e960a8d8c
analysis.crane.io/recommendation-target-kind: Node
analysis.crane.io/recommendation-target-name: worker-node-1
analysis.crane.io/recommendation-target-version: v1
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: bareMetal
beta.kubernetes.io/os: linux
ownerReferences:
- apiVersion: analysis.crane.io/v1alpha1
kind: RecommendationRule
name: idlenodes-rule
uid: 8921a198-7082-11ed-8b7b-246e960a8d8c
controller: false
blockOwnerDeletion: false
spec:
targetRef:
kind: Node
name: worker-node-1
apiVersion: v1
type: IdleNode
completionStrategy: {}
status:
targetRef: {}
action: Delete
lastUpdateTime: '2022-11-30T07:46:57Z'
```
In this example:
- Recommendation's TargetRef Point to Node:worker-node-1
- Recommendation type is IdleNode
- action is Delete,but offline a node is a complicated operation, we only give recommended advise.
## Implement
Perform the following steps to complete a recommendation process for idle nodes:
1. Scan all nodes and pods in the cluster
2. If all Pods on a node are DaemonSet, the node is considered to be idle
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ Currently, Crane support these Recommenders:

- [**Resource Recommendation**](/docs/tutorials/recommendation/resource-recommendation): Use the VPA algorithm to analyze the actual usage of applications and recommend more appropriate resource configurations.
- [**Replicas Recommendation**](/docs/tutorials/recommendation/replicas-recommendation): Use the HPA algorithm to analyze the actual usage of applications and recommend more appropriate replicas configurations.
- [**IdleNode Recommendation**](/docs/tutorials/recommendation/idlenode-recommendation): Find the idle nodes in cluster


### Recommender Framework

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,50 @@ weight: 15

在使用 Kubernetes 的过程中,常常由于污点配置、label selector、低装箱率、低利用率等因素导致部分节点出现闲置状态,浪费了大量成本,闲置节点推荐尝试帮助用户找到这部分节点来实现成本优化。

## 推荐示例

```yaml
kind: Recommendation
apiVersion: analysis.crane.io/v1alpha1
metadata:
name: idlenodes-rule-idlenode-5jxn9
namespace: crane-system
labels:
analysis.crane.io/recommendation-rule-name: idlenodes-rule
analysis.crane.io/recommendation-rule-recommender: IdleNode
analysis.crane.io/recommendation-rule-uid: 8921a198-7082-11ed-8b7b-246e960a8d8c
analysis.crane.io/recommendation-target-kind: Node
analysis.crane.io/recommendation-target-name: worker-node-1
analysis.crane.io/recommendation-target-version: v1
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: bareMetal
beta.kubernetes.io/os: linux
ownerReferences:
- apiVersion: analysis.crane.io/v1alpha1
kind: RecommendationRule
name: idlenodes-rule
uid: 8921a198-7082-11ed-8b7b-246e960a8d8c
controller: false
blockOwnerDeletion: false
spec:
targetRef:
kind: Node
name: worker-node-1
apiVersion: v1
type: IdleNode
completionStrategy: {}
status:
targetRef: {}
action: Delete
lastUpdateTime: '2022-11-30T07:46:57Z'
```
在该示例中:
- 推荐的 TargetRef 指向了 Node:worker-node-1
- 推荐类型为闲置节点推荐
- action 是 Delete,但是下线节点是复杂操作,这里只是给出建议
## 实现原理
闲置节点推荐按以下步骤完成一次推荐过程:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,11 +114,11 @@ status:
以 crane-system 的 Deployment Craned 为例,用户可以将 container, namespace, pod 换成希望验证的推荐结果。

```shell
sum(irate(container_cpu_usage_seconds_total{namespace="crane-system",pod=~"^craned-.*$"}[3m])) # cpu usage
sum(irate(container_cpu_usage_seconds_total{namespace="crane-system",pod=~"^craned-.*$",container!=""}[3m])) # cpu usage
```

```shell
sum(container_memory_working_set_bytes{namespace="crane-system",pod=~"^craned-.*$"}) # memory usage
sum(container_memory_working_set_bytes{namespace="crane-system",pod=~"^craned-.*$",container!=""}) # memory usage
```

## 支持的资源类型
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ metadata:
analysis.crane.io/recommendation-rule-name: workloads-rule
analysis.crane.io/recommendation-rule-recommender: Resource
analysis.crane.io/recommendation-rule-uid: 18588495-f325-4873-b45a-7acfe9f1ba94
analysis.crane.io/recommendation-target-kind: Deployment
analysis.crane.io/recommendation-target-name: load-test
analysis.crane.io/recommendation-target-version: v1
app: craned
app.kubernetes.io/instance: crane
app.kubernetes.io/managed-by: Helm
Expand Down

0 comments on commit bc60f10

Please sign in to comment.