Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

float64 error from arguments.go:65 when adding overcommit-factor to configmap #2053

Closed
tFable opened this issue Mar 6, 2022 · 11 comments
Closed
Assignees

Comments

@tFable
Copy link

tFable commented Mar 6, 2022

What happened:
After changing the configmap to add a overcommit-factor and installing volcano from this file on the v1.5 branch and changing the configmap getting this log continuously

arguments.go:65] Could not parse argument: %!s(float64=1) for key overcommit-factor to float64, with err strconv.ParseFloat: parsing "": invalid syntax

This is how the overcommit-factor was defined:

# Source: volcano/templates/scheduler.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: volcano-scheduler-configmap
  namespace: volcano-system
data:
  volcano-scheduler.conf: |
    actions: "enqueue, allocate, backfill"
    tiers:
    - plugins:
      - name: priority
      - name: gang
      - name: conformance
    - plugins:
      - name: overcommit
        arguments:
          overcommit-factor: 1.0
      - name: drf
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack

However, if on the volcano-development.yaml we change the docker images to pull a specific version (1.5.0) of the below containers (instead of :latest that is the default on the branch) the error doesn't occur.

volcanosh/vc-webhook-manager:v1.5.0
volcanosh/vc-controller-manager:v1.5.0
volcanosh/vc-scheduler:v1.5.0

That said, I am still not seeing the overcommit-factor be 1.0 (queued jobs are still having pending pods created).

What you expected to happen:

For no float error message in the logs and for the overcommit-ratio change to actually "take".

How to reproduce it (as minimally and precisely as possible):
Install Volcano from the volcano-development.yaml file of the v1.5 branch. Prior to that, add an argument for the overcommit-ratio to the overcommit plugin.

Anything else we need to know?:

Environment:

  • Volcano Version: 1.5
  • Kubernetes version (use kubectl version):1.23
  • Cloud provider or hardware configuration:vSphere on-repm
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@tFable tFable added the kind/bug Categorizes issue or PR as related to a bug. label Mar 6, 2022
@tFable tFable changed the title Error from arguments.go:65 float64 Error from arguments.go:65 when adding overcommit-factor to configmap Mar 6, 2022
@tFable tFable changed the title float64 Error from arguments.go:65 when adding overcommit-factor to configmap float64 error from arguments.go:65 when adding overcommit-factor to configmap Mar 6, 2022
@william-wang
Copy link
Member

/assign @Thor-wl please help to have a look.

@Thor-wl Thor-wl added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 7, 2022
@Thor-wl
Copy link
Contributor

Thor-wl commented Mar 7, 2022

@tFable Thanks for the feedback.
/assign

@Thor-wl
Copy link
Contributor

Thor-wl commented Mar 7, 2022

@tFable Hey, I've overviewed and checked the reproduce steps mentioned above. And it seems that the docker image tag is also v1.5.0 in the branch release-1.5. So what's the difference between the modifications you do above? Or have I missed something?

@Thor-wl
Copy link
Contributor

Thor-wl commented Mar 7, 2022

Well, I reproduce this phenomenon with the master branch. Let me take a fix.

@Thor-wl
Copy link
Contributor

Thor-wl commented Mar 7, 2022

I got the root reason. The configuration you set for scheduler is format error:

overcommit-factor: 1.0

There is a redundant blank space between : and 1. So the parameter for parse will be "". Pls keep no space between : and 1.

@Thor-wl Thor-wl removed kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 7, 2022
@Thor-wl
Copy link
Contributor

Thor-wl commented Mar 7, 2022

/close

@volcano-sh-bot
Copy link
Contributor

@Thor-wl: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tFable
Copy link
Author

tFable commented Mar 7, 2022

@tFable Hey, I've overviewed and checked the reproduce steps mentioned above. And it seems that the docker image tag is also v1.5.0 in the branch release-1.5. So what's the difference between the modifications you do above? Or have I missed something?

Hey @Thor-wl , I found out what caused me to not see the image tags. I was using the wrong branch not branch release-v1.5. Sorry for the confusion on this one.

When I switched to release-1.5 I do see the correct tags. Sorry again.

@tFable
Copy link
Author

tFable commented Mar 7, 2022

I got the root reason. The configuration you set for scheduler is format error:

overcommit-factor: 1.0

There is a redundant blank space between : and 1. So the parameter for parse will be "". Pls keep no space between : and 1.

Hey @Thor-wl,

I took the format directly from the code

actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: overcommit
arguments:
overcommit-factor: 1.0

What is the exact format the overcommit-factor should be written as?

This is something also being discussed in #2052 with @adamnovak

@Thor-wl
Copy link
Contributor

Thor-wl commented Mar 8, 2022

@tFable Hey, I noticed that there are some other reports about this bug and it occurs in the master branch. I will try to reproduce it and give a fix. As to the format, I think what you configure is acceptable for yaml file. It should be more robust for the code to parse the parameters.

@Thor-wl
Copy link
Contributor

Thor-wl commented Mar 8, 2022

@tFable I've submitted the fix and given the analysis in the PR(#2061). And tests are ok locally. Pls make a confirmation. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants