Sometimes `binpack` plugin score nodes with `terminating` pods too high, causing pending pods go to `pipeline` but not `allocated` #2782

jiangkaihua · 2023-04-13T02:02:43Z

What happened:

In allocate action, nodes with terminating pods would also be permitted to go through predicateFn as long as futureIdle(idle + releasing) is larger than pod request:

volcano/pkg/scheduler/actions/allocate/allocate.go

Lines 99 to 106 in bed4a04

    
           predicateFn := func(task *api.TaskInfo, node *api.NodeInfo) error { 
        
           	// Check for Resource Predicate 
        
           	if !task.InitResreq.LessEqual(node.FutureIdle(), api.Zero) { 
        
           		return api.NewFitError(task, node, api.NodeResourceFitFailed) 
        
           	} 
        
           	return ssn.PredicateFn(task, node) 
        
           }

So nodes with terminating pods could be candidates in score period even if they would not provide enough resources for pending pod to allocate immediately. Therefore in some scenarios nodes with terminating pods would score higher than nodes with enough idle resources.

What you expected to happen:

Nodes with terminating pods would got less score in score period than nodes with enough idle resources.

How to reproduce it (as minimally and precisely as possible):

Enable binpack plugin and give it a high weight;
Give different weight for each resources, like:

- name: binpack
  arguments:
    binpack.cpu: 8
    binpack.memory: 1
    binpack.weight: 4

Compose a scenario that a pod kept interminating on a node, whose idle resource not meet memory request of pending pod, but cpu is enough. And its idle + releasing resources just fit the pending pod.(idle.mem+releasing.mem=request.mem, idle.cpu=request.cpu)

In binpack plugin, it would score 0 for memory:

volcano/pkg/scheduler/plugins/binpack/binpack.go

Lines 248 to 251 in bed4a04

    
           usedFinally := requested + used 
        
           if usedFinally > capacity { 
        
           	return 0 
        
           }

But for cpu, it would return non-zero score since cpu is enough, And the total score would be high since cpu weight is 8/9:

volcano/pkg/scheduler/plugins/binpack/binpack.go

Lines 229 to 239 in bed4a04

    
           	score += resourceScore 
        
           	weightSum += resourceWeight 
        
           } 
        
           // mapping the result from [0, weightSum] to [0, 10(MaxPriority)] 
        
           if weightSum > 0 { 
        
           	score /= float64(weightSum) 
        
           } 
        
           score *= float64(k8sFramework.MaxNodeScore * int64(weight.BinPackingWeight)) 
        
           return score

Then pending pod would be pipelined to this node even if the other nodes owned enough idle sources, Because binpack plugin would score this node higher than others.

Anything else we need to know?:

Environment:

Volcano Version:
Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

wangyang0616 · 2023-04-26T03:47:59Z

/reopen

volcano-sh-bot · 2023-04-26T03:48:02Z

@wangyang0616: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wangyang0616 · 2023-04-26T04:39:23Z

Except that the binpack plugin has this problem, I understand that other algorithm plugins may encounter similar problems, such as task-topology, nodeorder, etc.

I was wondering if this generic problem could be solved by:
When allocate scores nodes, it divides nodes into two groups. One group is machines whose idle resources meet task resource requests, and the second group is futrue idle machines that meet task resource demands.

First, score the first group of machines, and if a suitable machine can be found, schedule the task to a suitable node; if the first group does not have a machine that meets the resource request, then score the second group of machines, and then select a suitable node for scheduling.

In this way, the pod can be dispatched to the machine that meets the resource requirements in the current session first, so that the pod will not be pending for a long time. If all the machines in the current session do not meet the requirements, it can also be scheduled to wait in the machine that meets the futrue idle.

zhaizhch · 2024-01-08T07:20:24Z

we also met this issue, the pr solved this issue perfect

jiangkaihua added the kind/bug Categorizes issue or PR as related to a bug. label Apr 13, 2023

jiangkaihua mentioned this issue Apr 13, 2023

Modify binpack score on nodes with releasing resources. #2786

Merged

volcano-sh-bot closed this as completed in #2786 Apr 14, 2023

volcano-sh-bot reopened this Apr 26, 2023

wangyang0616 mentioned this issue Apr 26, 2023

Pods are preferentially scheduled to machines that meet the current session resources #2815

Merged

volcano-sh-bot closed this as completed in #2815 Aug 7, 2023

william-wang added this to the v1.8 milestone Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes `binpack` plugin score nodes with `terminating` pods too high, causing pending pods go to `pipeline` but not `allocated` #2782

Sometimes `binpack` plugin score nodes with `terminating` pods too high, causing pending pods go to `pipeline` but not `allocated` #2782

jiangkaihua commented Apr 13, 2023 •

edited

Loading

wangyang0616 commented Apr 26, 2023

volcano-sh-bot commented Apr 26, 2023

wangyang0616 commented Apr 26, 2023

zhaizhch commented Jan 8, 2024

Sometimes binpack plugin score nodes with terminating pods too high, causing pending pods go to pipeline but not allocated #2782

Sometimes binpack plugin score nodes with terminating pods too high, causing pending pods go to pipeline but not allocated #2782

Comments

jiangkaihua commented Apr 13, 2023 • edited Loading

wangyang0616 commented Apr 26, 2023

volcano-sh-bot commented Apr 26, 2023

wangyang0616 commented Apr 26, 2023

zhaizhch commented Jan 8, 2024

Sometimes `binpack` plugin score nodes with `terminating` pods too high, causing pending pods go to `pipeline` but not `allocated` #2782

Sometimes `binpack` plugin score nodes with `terminating` pods too high, causing pending pods go to `pipeline` but not `allocated` #2782

jiangkaihua commented Apr 13, 2023 •

edited

Loading