-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor analyze and merge for crane agent #171
Conversation
7c152cb
to
6abc74c
Compare
pkg/ensurance/analyzer/analyzer.go
Outdated
series := s.getTimeSeriesFromMap(state, object.MetricRule.Selector) | ||
func (s *AnormalyAnalyzer) getImpacted(ts common.TimeSeries) []types.NamespacedName { | ||
//TODO: basicThrottleQosPriority be a input para to be a vara in AnormalyAnalyzer | ||
var basicThrottleQosPriority = executor.ClassAndPriority{PodQOSClass: v1.PodQOSBestEffort, PriorityClassValue: 0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still a todo? as discussed offline, I don't think we should have ClassAndPriority struct here, as more factor would be involved in the future, like the resource request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have moved the container's metric out, so there will don't have the baseline and this func has gone.
var impacted []types.NamespacedName | ||
var triggered, threshold bool | ||
for _, ts := range series { | ||
triggered = s.evaluator.EvalWithMetric(object.MetricRule.Name, float64(object.MetricRule.Value.Value()), ts.Samples[0].Value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you also add a follow up here? as current the evaluator is fixed as expression, it should be a interface and the actual method is defined in the CRD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be accomplished:
- After an action is triggered, we will don't have the baseline to filter pods, we will select every pod as an alternative;
- Use more dimensional metrics to sort pods;
- In executor, we will calculate the diff to the water level, and select some pods from the sorted pods according to the diff and end the process.
pkg/ensurance/analyzer/analyzer.go
Outdated
} | ||
//step1 filter dry run detections | ||
var dcsFiltered []ecache.DetectionCondition | ||
dcsFiltered = s.filterDryRunDetections(dcs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we need filter dry run? as dry run is an action type IMO
To be accomplished:
|
4d35082
to
99910a0
Compare
pkg/ensurance/analyzer/analyzer.go
Outdated
if !triggered { | ||
continue | ||
} | ||
klog.V(4).Infof("key %s, threshold %v", key, threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz refine this log, log should be a sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/ensurance/analyzer/analyzer.go
Outdated
ts.Samples[0].Value, | ||
common.GetValueByName(ts.Labels, common.LabelNamePodNamespace), | ||
common.GetValueByName(ts.Labels, common.LabelNamePodName)) | ||
//step2: use opa to check if triggered and get impacted pods for container MetricRule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opa is not implemented yet, suggest to remove keyword of opa to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return dc, err | ||
func (s *AnormalyAnalyzer) computeActionContext(threshold bool, key string, object ensuranceapi.ObjectiveEnsurance, ac *ecache.ActionContext) { | ||
if threshold { | ||
s.restored[key] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please double confirm is this thread safe? the whole Analyze function is executed in different go routine, in case the Analyze process takes time, it is possible that two threads access the same s.restored map and result in unexpected behavior or panic.
It's ok to have a follow up PR.
case state := <-s.stateChann:
go s.Analyze(state)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is not thread safe. Not only for restored, triggered and actionEventStatus are maps too. I will make another PR to fix this.
pkg/ensurance/executor/throttle.go
Outdated
cruntime "github.com/gocrane/crane/pkg/ensurance/runtime" | ||
"github.com/gocrane/crane/pkg/utils" | ||
) | ||
|
||
const ( | ||
MAX_UP_QUOTA = 60 * 1000 // 60CU | ||
MAX_UP_QUOTA = 60 * 1000 // 60CU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to use Camel style to define const as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
99910a0
to
b5acd4a
Compare
/lgtm |
1 similar comment
/lgtm |
conditionType: analyzed-pressure should follow same style with existing ones, like |
Refactor crane agent