Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix:When queue resources are insufficient or about to be insufficient, instances cannot be generated. #3198

Closed
LY-today opened this issue Nov 13, 2023 · 15 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@LY-today
Copy link
Contributor

What happened:
When queue resources are insufficient or about to be insufficient, instances cannot be generated

What you expected to happen:
Instances can also be generated when queue resources are insufficient or about to be insufficient.

How to reproduce it (as minimally and precisely as possible):
When the currently allocated amount of a certain resource in the queue plus the resource application amount of the new task is greater than the upper limit of the resource configured in the queue, the phenomenon that the instance cannot be created can be stably reproduced.

Anything else we need to know?:

  • vcJob event:Warning PodGroupPending 38s vc-controller-manager PodGroup default:dlp-stage-ecu-test6 unschedule,reason: 1/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable
  • podGroup event:Normal Unschedulable 92s (x13 over 105s) volcano queue resource quota insufficient

Environment:

  • Volcano Version:
    v1.7.0
  • Kubernetes version (use kubectl version):
    v1.18.2
  • Cloud provider or hardware configuration:
    none
  • OS (e.g. from /etc/os-release):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"
    CENTOS_MANTISBT_PROJECT="CentOS-7"
    CENTOS_MANTISBT_PROJECT_VERSION="7"
    REDHAT_SUPPORT_PRODUCT="centos"
    REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a):
    Linux 3.10.0-1160.45.1.el7.x86_64 Rename hpw.cloud keyword to volcano.sh #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
    none
  • Others:
    none
@LY-today LY-today added the kind/bug Categorizes issue or PR as related to a bug. label Nov 13, 2023
@Monokaix
Copy link
Member

Hi, what does the instance mean? pod or anything else?

@LY-today
Copy link
Contributor Author

Hi, what does the instance mean? pod or anything else?

Hi,pod

@Monokaix
Copy link
Member

Hi, what does the instance mean? pod or anything else?

Hi,pod

If queue has no sufficient resources, new task will not be scheduled, which I think it's normal: )

@LY-today
Copy link
Contributor Author

Hi, what does the instance mean? pod or anything else?

Hi,pod

If queue has no sufficient resources, new task will not be scheduled, which I think it's normal: )

@LY-today
Copy link
Contributor Author

Hi, what does the instance mean? pod or anything else?

Hi,pod

If queue has no sufficient resources, new task will not be scheduled, which I think it's normal: )

Maybe my description is wrong. It's not that it cannot be scheduled and pending appears, but that no instance is generated. This is what I think is unreasonable

@LY-today
Copy link
Contributor Author

@Monokaix The core of the problem is not that instances cannot be scheduled and pending occurs when resources are scarce, but that no instances are created at all.

@LY-today LY-today reopened this Nov 14, 2023
@william-wang
Copy link
Member

@LY-today Did you configured the enqueue action in scheduler-configmap and enalbe the delay pod creation feature. Please add your scheduler configmap if possible. Here is the introduction of delay pod creation feature.
https://github.com/volcano-sh/volcano/blob/master/docs/design/delay-pod-creation.md

@LY-today
Copy link
Contributor Author

@LY-today您是否在 Scheduler-configmap 中配置了排队操作并启用了该delay pod creation功能。如果可能,请添加您的调度程序配置映射。这里是功能的介绍delay pod creationhttps://github.com/volcano-sh/volcano/blob/master/docs/design/delay-pod-creation.md

Thanks for your feedback, I tested it and found the solution

@LY-today
Copy link
Contributor Author

@william-wang For this scenario, it can be solved if I directly close the enqueue action. I would like to ask whether other problems may be introduced after closing it?
The impact of apiserver pressure and slow scheduling is acceptable. Are there other effects?

@lowang-bh
Copy link
Member

lowang-bh commented Nov 14, 2023

The impact of apiserver pressure and slow scheduling is acceptable. Are there other effects?

Before release-1.6, if there is no enqueue action, podgroup will not be enqueue and job won't be scheduled.

After that version, it support scheduling without enqueue action. FYI: 91981bf48

@william-wang
Copy link
Member

@william-wang For this scenario, it can be solved if I directly close the enqueue action. I would like to ask whether other problems may be introduced after closing it? The impact of apiserver pressure and slow scheduling is acceptable. Are there other effects?

@LY-today There is no other effects without enqueue.

@LY-today
Copy link
Contributor Author

apiserver压力和调度缓慢的影响是可以接受的。还有其他影响吗?

在release-1.6之前,如果没有enqueue操作,podgroup将不会入队,作业也不会被调度。

该版本之后,支持无入队操作的调度。仅供参考:91981bf48

Thank you for your feedback

@LY-today
Copy link
Contributor Author

@william-wang For this scenario, it can be solved if I directly close the enqueue action. I would like to ask whether other problems may be introduced after closing it? The impact of apiserver pressure and slow scheduling is acceptable. Are there other effects?

@LY-today There is no other effects without enqueue.

Thank you for your feedback

@Monokaix
Copy link
Member

/close

@volcano-sh-bot
Copy link
Contributor

@Monokaix: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants