multi goroutine deal taskUnschedulable #3921

lishangyuzi · 2024-12-24T09:50:49Z

In the scenario of scheduling large-scale jobs, I also encountered a problem. When the job fails to be scheduled, all the pods under this job will update the PodCondition. Since it is necessary to communicate with the apiserver, this will take a long time.Could we consider using the multi-goroutine approach to handle this part of the logic?

volcano-sh-bot · 2024-12-24T09:50:52Z

Welcome @lishangyuzi!

It looks like this is your first PR to volcano-sh/volcano.

Thank you, and welcome to Volcano. 😃

volcano-sh-bot · 2024-12-24T09:50:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign lowang-bh
You can assign the PR to them by writing /assign @lowang-bh in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/scheduler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lishangyuzi · 2024-12-24T09:54:19Z

/assign @lowang-bh

lowang-bh · 2024-12-24T11:18:20Z

Have you increase the QPS of kubeclient in volcano scheduler?

lishangyuzi · 2024-12-25T06:04:37Z

Have you increase the QPS of kubeclient in volcano scheduler?

default qps of kubeclient has already met my expectations.It takes approximately 200 seconds for a job with 5000 pods to complete this stage.

volcano/cmd/scheduler/app/options/options.go

Lines 127 to 128 in b169623

    
           fs.Float32Var(&s.KubeClientOptions.QPS, "kube-api-qps", defaultQPS, "QPS to use while talking with kubernetes apiserver") 
        
           fs.IntVar(&s.KubeClientOptions.Burst, "kube-api-burst", defaultBurst, "Burst to use while talking with kubernetes apiserver")

volcano/cmd/scheduler/app/options/options.go

Lines 40 to 41 in b169623

    
           defaultQPS   = 2000.0 
        
           defaultBurst = 2000

The parameters related to my API server QPS are as follows:

--max-mutating-requests-inflight=4000
--max-requests-inflight=2000
--watch-cache-sizes=node#2000,pod#10000

multi goroutine update podcondition

bd8de08

volcano-sh-bot requested review from alcorj-mizar and shinytang6 December 24, 2024 09:50

volcano-sh-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 24, 2024

volcano-sh-bot assigned lowang-bh Dec 24, 2024

lishangyuzi mentioned this pull request Dec 25, 2024

[Enhancement]Optimize volcano end-to-end scheduling large-scale pod performance #3852

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi goroutine deal taskUnschedulable #3921

multi goroutine deal taskUnschedulable #3921

lishangyuzi commented Dec 24, 2024

volcano-sh-bot commented Dec 24, 2024

volcano-sh-bot commented Dec 24, 2024

lishangyuzi commented Dec 24, 2024

lowang-bh commented Dec 24, 2024

lishangyuzi commented Dec 25, 2024 •

edited

Loading

multi goroutine deal taskUnschedulable #3921

Are you sure you want to change the base?

multi goroutine deal taskUnschedulable #3921

Conversation

lishangyuzi commented Dec 24, 2024

volcano-sh-bot commented Dec 24, 2024

volcano-sh-bot commented Dec 24, 2024

lishangyuzi commented Dec 24, 2024

lowang-bh commented Dec 24, 2024

lishangyuzi commented Dec 25, 2024 • edited Loading

lishangyuzi commented Dec 25, 2024 •

edited

Loading