Skip to content
This repository has been archived by the owner on May 25, 2023. It is now read-only.

Cache snapshot includes all jobs? #667

Closed
Jeffwan opened this issue Mar 27, 2019 · 7 comments · Fixed by #673
Closed

Cache snapshot includes all jobs? #667

Jeffwan opened this issue Mar 27, 2019 · 7 comments · Fixed by #673
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Jeffwan
Copy link
Contributor

Jeffwan commented Mar 27, 2019

Please confirm this is expected or a bug.

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
/kind question

What happened:
I notice all jobs even kube-batch unrelated jobs are taken into consideration? Seems this is a bug when listener get pods, a shadow PodGroup is created for each job.

https://github.com/kubernetes-sigs/kube-batch/blob/e511be27ea352b856744e563f4dc5b95a3d4867c/pkg/scheduler/cache/event_handlers.go#L42-L44

When kube-batch allocate jobs, Job likes system pods should be skipped based on my understanding? Otherwise, even kube-system jobs will be added into cache snapshot.

https://github.com/kubernetes-sigs/kube-batch/blob/e511be27ea352b856744e563f4dc5b95a3d4867c/pkg/scheduler/cache/cache.go#L543-L555

What you expected to happen:
Skip workloads don't reply on kube-batch to make schedule decision.

How to reproduce it (as minimally and precisely as possible):
Create few deployment and create kube-batch by default.

Anything else we need to know?:

I have not created any jobs, all 6 are system level jobs.

I0326 22:48:33.645384   44955 cache.go:561] There are <6> Jobs, <1> Queues and <2> Nodes in total for scheduling.
I0326 22:48:33.645456   44955 session.go:109] Open Session f51ec308-5053-11e9-8c10-88e9fe523941 with <6> Job and <1> Queues
I0326 22:48:33.645608   44955 allocate.go:42] Enter Allocate ...
I0326 22:48:33.645717   44955 allocate.go:61] Try to allocate resource to 1 Queues
I0326 22:48:33.645765   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.645885   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.645942   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.645980   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.646001   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.646027   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.646042   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.646061   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.646072   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.646105   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.646129   44955 proportion.go:189] Queue <default>: deserved <cpu 420.00, memory 146800640.00, GPU 0.00>, allocated <cpu 420.00, memory 146800640.00, GPU 0.00>, share <1>
I0326 22:48:33.646147   44955 allocate.go:72] Queue <default> is overused, ignore it.
I0326 22:48:33.646174   44955 allocate.go:191] Leaving Allocate ...
I0326 22:48:33.646185   44955 backfill.go:41] Enter Backfill ...
I0326 22:48:33.646198   44955 backfill.go:71] Leaving Backfill ...

Environment:

  • Kubernetes version (use kubectl version): 1.11
  • Cloud provider or hardware configuration: aws
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 27, 2019
@k82cn
Copy link
Contributor

k82cn commented Mar 27, 2019

oh, for the pod without PodGroup annotation, scheduler will help to create a shadow PodGroup which makes kube-batch able to schedule normal object, e.g. ReplicaSet.

@Jeffwan
Copy link
Contributor Author

Jeffwan commented Mar 27, 2019

It will make more sense that kube-batch try to schedule normal k8s objects who is willing to be scheduled? Like with some specific annotation? Otherwise, even daemonsets or system level deployment will be taken into consideration.

oh, for the pod without PodGroup annotation, scheduler will help to create a shadow PodGroup which makes kube-batch able to schedule normal object, e.g. ReplicaSet.

@k82cn
Copy link
Contributor

k82cn commented Mar 28, 2019

try to schedule normal k8s objects who is willing to be scheduled?

Good point! We should only create Shadow PodGroup for the pod whose .spec.schedulerName is kube-batch.

@hex108
Copy link
Contributor

hex108 commented Mar 28, 2019

We should care about those pods whose scheduleName is not kube-batch, because they could be preempted for releasing more resource. However we do not need to allocate resource for them. So Shadow PodGroup is still needed, but we should not try to allocate resource for those pods whose scheduleName is not kube-batch?

@k82cn
Copy link
Contributor

k82cn commented Mar 28, 2019

they could be preempted for releasing more resource.

hm... maybe not as we did not define protocol between schedulers right now. For example, there'll a ping-pong that kube-batch & default-scheduler keep preempt resource for other pods :)

@hex108
Copy link
Contributor

hex108 commented Mar 28, 2019

Get it. It makes sense.

@hex108
Copy link
Contributor

hex108 commented Mar 28, 2019

I could help fix it if you do not plan to work on it.

/assign

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants