Skip to content
This repository has been archived by the owner on May 25, 2023. It is now read-only.

Add performance metrics for scheduling #592

Merged
merged 2 commits into from
Mar 10, 2019

Conversation

Jeffwan
Copy link
Contributor

@Jeffwan Jeffwan commented Feb 14, 2019

What this PR does / why we need it:
Add metrics support for scheduling and use instrumentation to help debug performance issues.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Follow up on #579 Fixes #487

Special notes for your reviewer:
For action, plugin duration, use microseconds,
For e2e, use milliseconds,
Use prometheus.ExponentialBuckets(5, 2, 10) for histogram buckets.

Please check attached metrics make sense to you.

Release note:

Add performance metrics for scheduling

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 14, 2019
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 14, 2019
@Jeffwan
Copy link
Contributor Author

Jeffwan commented Feb 14, 2019

# HELP kube_batch_action_scheduling_latency_microseconds Action scheduling latency in microseconds
# TYPE kube_batch_action_scheduling_latency_microseconds histogram
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="5.0"} 0.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="10.0"} 0.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="20.0"} 0.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="40.0"} 0.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="80.0"} 0.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="160.0"} 0.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="320.0"} 0.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="640.0"} 0.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="1280.0"} 3.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="2560.0"} 21.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="allocate",le="+Inf"} 24.0
kube_batch_action_scheduling_latency_microseconds_sum{action="allocate"} 106560.57100000001
kube_batch_action_scheduling_latency_microseconds_count{action="allocate"} 24.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="5.0"} 0.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="10.0"} 0.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="20.0"} 12.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="40.0"} 24.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="80.0"} 24.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="160.0"} 24.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="320.0"} 24.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="640.0"} 24.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="1280.0"} 24.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="2560.0"} 24.0
kube_batch_action_scheduling_latency_microseconds_bucket{action="backfill",le="+Inf"} 24.0
kube_batch_action_scheduling_latency_microseconds_sum{action="backfill"} 488.09799999999996
kube_batch_action_scheduling_latency_microseconds_count{action="backfill"} 24.0
# HELP kube_batch_e2e_scheduling_latency_milliseconds E2e scheduling latency in milliseconds (scheduling algorithm + binding)
# TYPE kube_batch_e2e_scheduling_latency_milliseconds histogram
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="5.0"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="10.0"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="20.0"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="40.0"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="80.0"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="160.0"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="320.0"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="640.0"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="1280.0"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="2560.0"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_bucket{le="+Inf"} 24.0
kube_batch_e2e_scheduling_latency_milliseconds_sum 0.017657
kube_batch_e2e_scheduling_latency_milliseconds_count 24.0
# HELP kube_batch_job_retry_counts Number of retry counts for one job
# TYPE kube_batch_job_retry_counts counter
kube_batch_job_retry_counts{job_id="qj-1"} 24.0
kube_batch_job_retry_counts{job_id="qj-2"} 24.0
kube_batch_job_retry_counts{job_id="qj-3"} 24.0
kube_batch_job_retry_counts{job_id="qj-4"} 24.0
kube_batch_job_retry_counts{job_id="qj-5"} 24.0
kube_batch_job_retry_counts{job_id="qj-6"} 24.0
# HELP kube_batch_plugin_scheduling_latency_microseconds Plugin scheduling latency in microseconds
# TYPE kube_batch_plugin_scheduling_latency_microseconds histogram
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="5.0"} 22.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="10.0"} 22.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="20.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="40.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="80.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="160.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="320.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="640.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="1280.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="2560.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="drf",le="+Inf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_sum{OnSession="OnSessionClose",plugin="drf"} 46.387
kube_batch_plugin_scheduling_latency_microseconds_count{OnSession="OnSessionClose",plugin="drf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="5.0"} 0.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="10.0"} 0.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="20.0"} 0.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="40.0"} 16.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="80.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="160.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="320.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="640.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="1280.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="2560.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="gang",le="+Inf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_sum{OnSession="OnSessionClose",plugin="gang"} 840.4420000000001
kube_batch_plugin_scheduling_latency_microseconds_count{OnSession="OnSessionClose",plugin="gang"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="5.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="10.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="20.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="40.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="80.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="160.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="320.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="640.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="1280.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="2560.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="predicates",le="+Inf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_sum{OnSession="OnSessionClose",plugin="predicates"} 8.561
kube_batch_plugin_scheduling_latency_microseconds_count{OnSession="OnSessionClose",plugin="predicates"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="5.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="10.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="20.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="40.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="80.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="160.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="320.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="640.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="1280.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="2560.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="priority",le="+Inf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_sum{OnSession="OnSessionClose",plugin="priority"} 8.020000000000001
kube_batch_plugin_scheduling_latency_microseconds_count{OnSession="OnSessionClose",plugin="priority"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="5.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="10.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="20.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="40.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="80.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="160.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="320.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="640.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="1280.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="2560.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionClose",plugin="proportion",le="+Inf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_sum{OnSession="OnSessionClose",plugin="proportion"} 9.039
kube_batch_plugin_scheduling_latency_microseconds_count{OnSession="OnSessionClose",plugin="proportion"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="5.0"} 8.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="10.0"} 19.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="20.0"} 23.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="40.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="80.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="160.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="320.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="640.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="1280.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="2560.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="drf",le="+Inf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_sum{OnSession="OnSessionOpen",plugin="drf"} 181.70999999999998
kube_batch_plugin_scheduling_latency_microseconds_count{OnSession="OnSessionOpen",plugin="drf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="5.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="10.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="20.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="40.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="80.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="160.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="320.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="640.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="1280.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="2560.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="gang",le="+Inf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_sum{OnSession="OnSessionOpen",plugin="gang"} 45.257000000000005
kube_batch_plugin_scheduling_latency_microseconds_count{OnSession="OnSessionOpen",plugin="gang"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="5.0"} 23.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="10.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="20.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="40.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="80.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="160.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="320.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="640.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="1280.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="2560.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="predicates",le="+Inf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_sum{OnSession="OnSessionOpen",plugin="predicates"} 43.967999999999996
kube_batch_plugin_scheduling_latency_microseconds_count{OnSession="OnSessionOpen",plugin="predicates"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="5.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="10.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="20.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="40.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="80.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="160.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="320.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="640.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="1280.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="2560.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="priority",le="+Inf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_sum{OnSession="OnSessionOpen",plugin="priority"} 23.65400000000001
kube_batch_plugin_scheduling_latency_microseconds_count{OnSession="OnSessionOpen",plugin="priority"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="5.0"} 0.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="10.0"} 8.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="20.0"} 19.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="40.0"} 22.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="80.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="160.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="320.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="640.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="1280.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="2560.0"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_bucket{OnSession="OnSessionOpen",plugin="proportion",le="+Inf"} 24.0
kube_batch_plugin_scheduling_latency_microseconds_sum{OnSession="OnSessionOpen",plugin="proportion"} 410.36300000000006
kube_batch_plugin_scheduling_latency_microseconds_count{OnSession="OnSessionOpen",plugin="proportion"} 24.0
# HELP kube_batch_pod_preemption_victims Number of selected preemption victims
# TYPE kube_batch_pod_preemption_victims gauge
kube_batch_pod_preemption_victims 0.0
# HELP kube_batch_task_scheduling_latency_microseconds Task scheduling latency in microseconds
# TYPE kube_batch_task_scheduling_latency_microseconds histogram
kube_batch_task_scheduling_latency_microseconds_bucket{le="5.0"} 0.0
kube_batch_task_scheduling_latency_microseconds_bucket{le="10.0"} 0.0
kube_batch_task_scheduling_latency_microseconds_bucket{le="20.0"} 0.0
kube_batch_task_scheduling_latency_microseconds_bucket{le="40.0"} 0.0
kube_batch_task_scheduling_latency_microseconds_bucket{le="80.0"} 0.0
kube_batch_task_scheduling_latency_microseconds_bucket{le="160.0"} 0.0
kube_batch_task_scheduling_latency_microseconds_bucket{le="320.0"} 0.0
kube_batch_task_scheduling_latency_microseconds_bucket{le="640.0"} 0.0
kube_batch_task_scheduling_latency_microseconds_bucket{le="1280.0"} 0.0
kube_batch_task_scheduling_latency_microseconds_bucket{le="2560.0"} 0.0
kube_batch_task_scheduling_latency_microseconds_bucket{le="+Inf"} 0.0
kube_batch_task_scheduling_latency_microseconds_sum 0.0
kube_batch_task_scheduling_latency_microseconds_count 0.0
# HELP kube_batch_total_preemption_attempts Total preemption attempts in the cluster till now
# TYPE kube_batch_total_preemption_attempts counter
kube_batch_total_preemption_attempts 0.0
# HELP kube_batch_unschedule_job_count Number of jobs could not be scheduled
# TYPE kube_batch_unschedule_job_count gauge
kube_batch_unschedule_job_count 6.0
# HELP kube_batch_unschedule_task_count Number of tasks could not be scheduled
# TYPE kube_batch_unschedule_task_count gauge
kube_batch_unschedule_task_count{job_id="qj-1"} 2.0
kube_batch_unschedule_task_count{job_id="qj-2"} 3.0
kube_batch_unschedule_task_count{job_id="qj-3"} 3.0
kube_batch_unschedule_task_count{job_id="qj-4"} 3.0
kube_batch_unschedule_task_count{job_id="qj-5"} 3.0
kube_batch_unschedule_task_count{job_id="qj-6"} 3.0

@Jeffwan
Copy link
Contributor Author

Jeffwan commented Feb 14, 2019

/cc @k82cn @jiaxuanzhou

@k8s-ci-robot k8s-ci-robot requested a review from k82cn February 14, 2019 23:49
@k8s-ci-robot
Copy link
Contributor

@Jeffwan: GitHub didn't allow me to request PR reviews from the following users: jiaxuanzhou.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @k82cn @jiaxuanzhou

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -157,6 +161,7 @@ func (alloc *allocateAction) Execute(ssn *framework.Session) {
}

assigned = true
metrics.UpdateTaskScheduleDuration(metrics.Duration(taskScheduleStart))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right way to measure kube_batch_task_scheduling_latency_microseconds ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it is not the right way to measure the latency, the start time should be measured at the point when the task(pod) observed by kube-batch.

Copy link
Contributor

@jiaxuanzhou jiaxuanzhou Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it is the same way of measuring the latency of job

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the start time should be measured at the point when the task(pod) observed by kube-batch

+1

Maybe we can add more info into TaskInfo.

Copy link
Contributor Author

@Jeffwan Jeffwan Feb 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we consider all staged, I think we can add CreationTimestamp in TaskInfo, when kube-batch create task, deep copy timestamp to task. This timestamp would be the startTime.

We probably want to add endTime inside job_info.go#UpdateTaskStatus. If status is Binding, we measure latency. Any other status indicate success of scheduling? I am not sure what's Pipeline. Should we also consider it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add CreationTimestamp in TaskInfo ...

If so, how to handle scheduler crash case? Maybe we can use Pod's CreationTimestamp directly; it only include the additional time from apiserver to scheduler which should be short.

@k82cn
Copy link
Contributor

k82cn commented Feb 19, 2019

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jeffwan, k82cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2019
@Jeffwan
Copy link
Contributor Author

Jeffwan commented Mar 9, 2019

Sorry for leaving this hanging for so long. I will clean it up and submit revision today.

@Jeffwan Jeffwan changed the title [WIP] Add performance metrics for scheduling Add performance metrics for scheduling Mar 10, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2019
@Jeffwan
Copy link
Contributor Author

Jeffwan commented Mar 10, 2019

Found a problem if pod running on different machines, format time using time.Format(time.RFC3339Nano)

creationTime: 2019-03-09T19:38:56-08:00
scheduleTime: 2019-03-09T19:38:53.600548-08:00

I did some search and notice you already report this problem here https://groups.google.com/forum/#!topic/kubernetes-dev/rUniUqhI5YM

Seems it's user's responsibility to keep clock sync.

@k82cn
Copy link
Contributor

k82cn commented Mar 10, 2019

Seems it's user's responsibility to keep clock sync.

Yes, it dependent on user's environment as the timestamp maybe set by different components.
I think that's fine for this case :)

@k82cn
Copy link
Contributor

k82cn commented Mar 10, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 10, 2019
@k8s-ci-robot k8s-ci-robot merged commit 82fdced into kubernetes-retired:master Mar 10, 2019
k8s-ci-robot added a commit that referenced this pull request Mar 10, 2019
…ream-release-0.4

Automated cherry pick of #592: Update prometheus vendor libs
@Jeffwan Jeffwan deleted the metrics branch March 10, 2019 05:54
kevin-wangzefeng pushed a commit to kevin-wangzefeng/scheduler that referenced this pull request Jun 28, 2019
Add performance metrics for scheduling
kevin-wangzefeng pushed a commit to kevin-wangzefeng/scheduler that referenced this pull request Jun 28, 2019
Add performance metrics for scheduling
kevin-wangzefeng pushed a commit to kevin-wangzefeng/scheduler that referenced this pull request Jun 28, 2019
Add performance metrics for scheduling
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add performance metrics
4 participants