-
Notifications
You must be signed in to change notification settings - Fork 264
Add performance metrics for scheduling #592
Conversation
|
/cc @k82cn @jiaxuanzhou |
@Jeffwan: GitHub didn't allow me to request PR reviews from the following users: jiaxuanzhou. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -157,6 +161,7 @@ func (alloc *allocateAction) Execute(ssn *framework.Session) { | |||
} | |||
|
|||
assigned = true | |||
metrics.UpdateTaskScheduleDuration(metrics.Duration(taskScheduleStart)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right way to measure kube_batch_task_scheduling_latency_microseconds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it is not the right way to measure the latency, the start time should be measured at the point when the task(pod) observed by kube-batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and it is the same way of measuring the latency of job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the start time should be measured at the point when the task(pod) observed by kube-batch
+1
Maybe we can add more info into TaskInfo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we consider all staged, I think we can add CreationTimestamp
in TaskInfo, when kube-batch create task, deep copy timestamp to task. This timestamp would be the startTime.
We probably want to add endTime inside job_info.go#UpdateTaskStatus. If status is Binding, we measure latency. Any other status indicate success of scheduling? I am not sure what's Pipeline. Should we also consider it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add
CreationTimestamp
in TaskInfo ...
If so, how to handle scheduler crash case? Maybe we can use Pod's CreationTimestamp
directly; it only include the additional time from apiserver to scheduler which should be short.
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jeffwan, k82cn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Sorry for leaving this hanging for so long. I will clean it up and submit revision today. |
Found a problem if pod running on different machines, format time using creationTime: 2019-03-09T19:38:56-08:00 I did some search and notice you already report this problem here https://groups.google.com/forum/#!topic/kubernetes-dev/rUniUqhI5YM Seems it's user's responsibility to keep clock sync. |
Yes, it dependent on user's environment as the timestamp maybe set by different components. |
/lgtm |
…ream-release-0.4 Automated cherry pick of #592: Update prometheus vendor libs
Add performance metrics for scheduling
Add performance metrics for scheduling
Add performance metrics for scheduling
What this PR does / why we need it:
Add metrics support for scheduling and use instrumentation to help debug performance issues.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Follow up on #579 Fixes #487
Special notes for your reviewer:
For action, plugin duration, use microseconds,
For e2e, use milliseconds,
Use prometheus.ExponentialBuckets(5, 2, 10) for histogram buckets.
Please check attached metrics make sense to you.
Release note: