Refactor analyze and merge for crane agent #171

chenkaiyue · 2022-03-01T08:03:56Z

Refactor crane agent

mfanjie · 2022-03-01T08:53:32Z

pkg/ensurance/analyzer/analyzer.go

-		series := s.getTimeSeriesFromMap(state, object.MetricRule.Selector)
+func (s *AnormalyAnalyzer) getImpacted(ts common.TimeSeries) []types.NamespacedName {
+	//TODO: basicThrottleQosPriority be a input para to be a vara in AnormalyAnalyzer
+	var basicThrottleQosPriority = executor.ClassAndPriority{PodQOSClass: v1.PodQOSBestEffort, PriorityClassValue: 0}


is this still a todo? as discussed offline, I don't think we should have ClassAndPriority struct here, as more factor would be involved in the future, like the resource request

OK, I have moved the container's metric out, so there will don't have the baseline and this func has gone.

pkg/ensurance/analyzer/analyzer.go

mfanjie · 2022-03-01T08:57:35Z

pkg/ensurance/analyzer/analyzer.go

+	var impacted []types.NamespacedName
+	var triggered, threshold bool
+	for _, ts := range series {
+		triggered = s.evaluator.EvalWithMetric(object.MetricRule.Name, float64(object.MetricRule.Value.Value()), ts.Samples[0].Value)


would you also add a follow up here? as current the evaluator is fixed as expression, it should be a interface and the actual method is defined in the CRD.

To be accomplished:

After an action is triggered, we will don't have the baseline to filter pods, we will select every pod as an alternative;

Use more dimensional metrics to sort pods;

In executor, we will calculate the diff to the water level, and select some pods from the sorted pods according to the diff and end the process.

pkg/ensurance/analyzer/analyzer.go

mfanjie · 2022-03-01T09:03:11Z

pkg/ensurance/analyzer/analyzer.go

-	}
+	//step1 filter dry run detections
+	var dcsFiltered []ecache.DetectionCondition
+	dcsFiltered = s.filterDryRunDetections(dcs)


I am wondering if we need filter dry run? as dry run is an action type IMO

pkg/ensurance/analyzer/analyzer.go

pkg/ensurance/executor/evict.go

pkg/ensurance/executor/throttle.go

examples/ensurance/waterline3.yaml

pkg/ensurance/executor/throttle.go

pkg/ensurance/executor/evict.go

pkg/ensurance/collector/collector.go

pkg/ensurance/analyzer/analyzer.go

chenkaiyue · 2022-03-01T10:24:53Z

To be accomplished:

After an action is triggered, we will don't have the baseline to filter pods, we will select every pod as an alternative;
Use more dimensional metrics to sort pods;
In executor, we will calculate the diff to the water level, and select some pods from the sorted pods according to the diff and end the process.

mfanjie · 2022-03-02T03:23:28Z

pkg/ensurance/analyzer/analyzer.go

-			if !triggered {
-				continue
-			}
+	klog.V(4).Infof("key %s, threshold %v", key, threshold)


plz refine this log, log should be a sentence

mfanjie · 2022-03-02T03:25:14Z

pkg/ensurance/analyzer/analyzer.go

-				ts.Samples[0].Value,
-				common.GetValueByName(ts.Labels, common.LabelNamePodNamespace),
-				common.GetValueByName(ts.Labels, common.LabelNamePodName))
+	//step2: use opa to check if triggered and get impacted pods for container MetricRule


opa is not implemented yet, suggest to remove keyword of opa to avoid confusion.

mfanjie · 2022-03-02T03:29:15Z

pkg/ensurance/analyzer/analyzer.go

-			return dc, err
+func (s *AnormalyAnalyzer) computeActionContext(threshold bool, key string, object ensuranceapi.ObjectiveEnsurance, ac *ecache.ActionContext) {
+	if threshold {
+		s.restored[key] = 0


please double confirm is this thread safe? the whole Analyze function is executed in different go routine, in case the Analyze process takes time, it is possible that two threads access the same s.restored map and result in unexpected behavior or panic.
It's ok to have a follow up PR.

case state := <-s.stateChann: go s.Analyze(state)

Yeah, this is not thread safe. Not only for restored, triggered and actionEventStatus are maps too. I will make another PR to fix this.

mfanjie · 2022-03-02T03:31:58Z

pkg/ensurance/executor/throttle.go

 	cruntime "github.com/gocrane/crane/pkg/ensurance/runtime"
 	"github.com/gocrane/crane/pkg/utils"
 )

 const (
-	MAX_UP_QUOTA = 60 * 1000 // 60CU
+	MAX_UP_QUOTA          = 60 * 1000 // 60CU


suggest to use Camel style to define const as well.

chenkaiyue · 2022-03-02T07:05:07Z

For scheduler action:

chenkaiyue · 2022-03-02T08:49:36Z

For action throttle:
we use two pod:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: greedy
description: Priority for low level.
value: -100

---

apiVersion: v1
kind: Pod
metadata:
  name: low
spec:
  containers:
    - image: docker.io/gocrane/stress-ng:v0.12.09
      imagePullPolicy: Always
      name: low
      command:
      - stress-ng
      - -c
      - "3"
      - --cpu-method
      - cpuid
  priorityClassName: greedy

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: middle
description: Priority for middle level.
value: -1

---

apiVersion: v1
kind: Pod
metadata:
  name: middle
spec:
  containers:
    - image: docker.io/gocrane/stress-ng:v0.12.09
      imagePullPolicy: Always
      name: middle
      command:
        - /bin/bash
        - -c
        - "sleep 36000"
      resources:
        requests:
          memory: 2Gi
          cpu: 1
        limits:
          memory: 15Gi
          cpu: 8
  priorityClassName: middle

and start two stress-ng processes in middle pod,
the cpu usage in action is 4000

After a while, the middle pod cpu quota is

from 8->4.2

and the cpu usage on node is :

caculate cpuacct.usage changes in a period of time:

yan234280533 · 2022-03-02T08:57:04Z

/lgtm

mfanjie · 2022-03-02T09:14:31Z

/lgtm

mfanjie · 2022-03-02T09:17:22Z

conditionType: analyzed-pressure should follow same style with existing ones, like PIDPressure.
and It's better if we can show what kind of pressure is identified instead of analyzed-pressure, but I am open for this, please consider which is the best way.

chenkaiyue changed the title ~~refactor analyze and merge for crane agent~~ [WIP]Refactor analyze and merge for crane agent Mar 1, 2022

chenkaiyue force-pushed the refactorAgent branch 3 times, most recently from 7c152cb to 6abc74c Compare March 1, 2022 08:39

mfanjie reviewed Mar 1, 2022

View reviewed changes

yan234280533 reviewed Mar 1, 2022

View reviewed changes

chenkaiyue force-pushed the refactorAgent branch 2 times, most recently from 4d35082 to 99910a0 Compare March 2, 2022 02:50

mfanjie reviewed Mar 2, 2022

View reviewed changes

refactor analyze and merge module for crane agent

b5acd4a

chenkaiyue force-pushed the refactorAgent branch from 99910a0 to b5acd4a Compare March 2, 2022 07:50

mfanjie merged commit d3a4847 into gocrane:main Mar 2, 2022

mfanjie changed the title ~~[WIP]Refactor analyze and merge for crane agent~~ Refactor analyze and merge for crane agent Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor analyze and merge for crane agent #171

Refactor analyze and merge for crane agent #171

chenkaiyue commented Mar 1, 2022

mfanjie Mar 1, 2022

chenkaiyue Mar 1, 2022 •

edited

Loading

mfanjie Mar 1, 2022

chenkaiyue Mar 1, 2022

mfanjie Mar 1, 2022

chenkaiyue commented Mar 1, 2022 •

edited

Loading

mfanjie Mar 2, 2022

chenkaiyue Mar 2, 2022

mfanjie Mar 2, 2022

chenkaiyue Mar 2, 2022

mfanjie Mar 2, 2022

chenkaiyue Mar 2, 2022 •

edited

Loading

mfanjie Mar 2, 2022

chenkaiyue Mar 2, 2022

chenkaiyue commented Mar 2, 2022 •

edited

Loading

chenkaiyue commented Mar 2, 2022 •

edited

Loading

yan234280533 commented Mar 2, 2022

mfanjie commented Mar 2, 2022

mfanjie commented Mar 2, 2022 •

edited

Loading

Refactor analyze and merge for crane agent #171

Refactor analyze and merge for crane agent #171

Conversation

chenkaiyue commented Mar 1, 2022

Choose a reason for hiding this comment

chenkaiyue Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenkaiyue commented Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenkaiyue Mar 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenkaiyue commented Mar 2, 2022 • edited Loading

chenkaiyue commented Mar 2, 2022 • edited Loading

yan234280533 commented Mar 2, 2022

mfanjie commented Mar 2, 2022

mfanjie commented Mar 2, 2022 • edited Loading

chenkaiyue Mar 1, 2022 •

edited

Loading

chenkaiyue commented Mar 1, 2022 •

edited

Loading

chenkaiyue Mar 2, 2022 •

edited

Loading

chenkaiyue commented Mar 2, 2022 •

edited

Loading

chenkaiyue commented Mar 2, 2022 •

edited

Loading

mfanjie commented Mar 2, 2022 •

edited

Loading