Cache snapshot includes all jobs? #667

Jeffwan · 2019-03-27T05:54:24Z

Please confirm this is expected or a bug.

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
/kind question

What happened:
I notice all jobs even kube-batch unrelated jobs are taken into consideration? Seems this is a bug when listener get pods, a shadow PodGroup is created for each job.

https://github.com/kubernetes-sigs/kube-batch/blob/e511be27ea352b856744e563f4dc5b95a3d4867c/pkg/scheduler/cache/event_handlers.go#L42-L44

When kube-batch allocate jobs, Job likes system pods should be skipped based on my understanding? Otherwise, even kube-system jobs will be added into cache snapshot.

https://github.com/kubernetes-sigs/kube-batch/blob/e511be27ea352b856744e563f4dc5b95a3d4867c/pkg/scheduler/cache/cache.go#L543-L555

What you expected to happen:
Skip workloads don't reply on kube-batch to make schedule decision.

How to reproduce it (as minimally and precisely as possible):
Create few deployment and create kube-batch by default.

Anything else we need to know?:

I have not created any jobs, all 6 are system level jobs.

I0326 22:48:33.645384   44955 cache.go:561] There are <6> Jobs, <1> Queues and <2> Nodes in total for scheduling.
I0326 22:48:33.645456   44955 session.go:109] Open Session f51ec308-5053-11e9-8c10-88e9fe523941 with <6> Job and <1> Queues
I0326 22:48:33.645608   44955 allocate.go:42] Enter Allocate ...
I0326 22:48:33.645717   44955 allocate.go:61] Try to allocate resource to 1 Queues
I0326 22:48:33.645765   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.645885   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.645942   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.645980   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.646001   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.646027   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.646042   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.646061   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.646072   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.646105   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.646129   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.646147   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.646174   44955 allocate.go:191] Leaving Allocate ...
I0326 22:48:33.646185   44955 backfill.go:41] Enter Backfill ...
I0326 22:48:33.646198   44955 backfill.go:71] Leaving Backfill ...

Environment:

Kubernetes version (use kubectl version): 1.11
Cloud provider or hardware configuration: aws
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

k82cn · 2019-03-27T12:15:30Z

oh, for the pod without PodGroup annotation, scheduler will help to create a shadow PodGroup which makes kube-batch able to schedule normal object, e.g. ReplicaSet.

Jeffwan · 2019-03-27T17:09:29Z

It will make more sense that kube-batch try to schedule normal k8s objects who is willing to be scheduled? Like with some specific annotation? Otherwise, even daemonsets or system level deployment will be taken into consideration.

oh, for the pod without PodGroup annotation, scheduler will help to create a shadow PodGroup which makes kube-batch able to schedule normal object, e.g. ReplicaSet.

k82cn · 2019-03-28T01:12:11Z

try to schedule normal k8s objects who is willing to be scheduled?

Good point! We should only create Shadow PodGroup for the pod whose .spec.schedulerName is kube-batch.

hex108 · 2019-03-28T02:08:10Z

We should care about those pods whose scheduleName is not kube-batch, because they could be preempted for releasing more resource. However we do not need to allocate resource for them. So Shadow PodGroup is still needed, but we should not try to allocate resource for those pods whose scheduleName is not kube-batch?

k82cn · 2019-03-28T02:14:24Z

they could be preempted for releasing more resource.

hm... maybe not as we did not define protocol between schedulers right now. For example, there'll a ping-pong that kube-batch & default-scheduler keep preempt resource for other pods :)

hex108 · 2019-03-28T02:17:03Z

Get it. It makes sense.

hex108 · 2019-03-28T05:04:54Z

I could help fix it if you do not plan to work on it.

/assign

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 27, 2019

k8s-ci-robot assigned hex108 Mar 28, 2019

hex108 mentioned this issue Mar 28, 2019

Do not create PodGroup and Job for task whose scheduler is not kube-b… #673

Merged

k8s-ci-robot closed this as completed in #673 Mar 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache snapshot includes all jobs? #667

Cache snapshot includes all jobs? #667

Jeffwan commented Mar 27, 2019 •

edited

Loading

k82cn commented Mar 27, 2019

Jeffwan commented Mar 27, 2019

k82cn commented Mar 28, 2019

hex108 commented Mar 28, 2019

k82cn commented Mar 28, 2019

hex108 commented Mar 28, 2019

hex108 commented Mar 28, 2019

Cache snapshot includes all jobs? #667

Cache snapshot includes all jobs? #667

Comments

Jeffwan commented Mar 27, 2019 • edited Loading

k82cn commented Mar 27, 2019

Jeffwan commented Mar 27, 2019

k82cn commented Mar 28, 2019

hex108 commented Mar 28, 2019

k82cn commented Mar 28, 2019

hex108 commented Mar 28, 2019

hex108 commented Mar 28, 2019

Jeffwan commented Mar 27, 2019 •

edited

Loading