Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor resource requests not applied to init/wait containers #6809

Closed
zorulo opened this issue Sep 27, 2021 · 11 comments · Fixed by #6879
Closed

Executor resource requests not applied to init/wait containers #6809

zorulo opened this issue Sep 27, 2021 · 11 comments · Fixed by #6879
Assignees
Labels

Comments

@zorulo
Copy link
Contributor

zorulo commented Sep 27, 2021

Summary

What happened/what you expected to happen?

We've recently tried upgrading argo from 2.11.x to the latest to take advantage of new features, namely the emissary executor.

However, now when specifying executor requests for the init/wait containers as described in the workflow-controller-configmap here, the resource requests do not seem to be getting applied like before.

We've tried numerous versions from 2.12.x onwards but none seem to properly apply resource requests as expected.

What version is it broken in?

Anything >= 2.12.x

What version was it working in?

2.11.x

Diagnostics

Simple hello world workflow that reproduces this:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  entrypoint: hello
  templates:
  - name: hello
    container:
      image: alpine:3.6
      command: [sh, '-c']
      args: ["echo \"Hello world!\""]

Running kubectl describe pod <pod> on the resulting pod shows that there are no resource requests for the init/wait containers (3.1.12):

  wait:
    Container ID:  docker://a2f8d0f4a7e879eec2262d817d526ba4d63c360b47a3cb32b8d3df744b0089b7
    Image:         ...
    Image ID:      ...
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
      --loglevel
      info
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 27 Sep 2021 11:20:09 -0700
      Finished:     Mon, 27 Sep 2021 11:20:10 -0700
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:                    hello-world-k22ng (v1:metadata.name)
      ARGO_CONTAINER_RUNTIME_EXECUTOR:  emissary
      GODEBUG:                          x509ignoreCN=0
      ARGO_CONTAINER_NAME:              wait
      ARGO_INCLUDE_SCRIPT_OUTPUT:       false

Expected output (2.11.8):

  wait:
    Container ID:  docker://d5c3015d2af78b9f537667b490335ba889e3bf31618ff8bc78be52a23b378a22
    Image:         ...
    Image ID:      ...
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 27 Sep 2021 11:22:19 -0700
      Finished:     Mon, 27 Sep 2021 11:22:19 -0700
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:     2
      memory:  4Gi
    Environment:
      ARGO_POD_NAME:  hello-world-pvd72 (v1:metadata.name)

Our workflow controller configmap specifies as follows:

  executor: |
    resources:
      requests:
        cpu: 2
        memory: 4Gi

What Kubernetes provider are you using?

AWS EKS

What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary

Tried docker and emissary


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@zorulo zorulo added type/bug type/regression Regression from previous behavior (a specific type of bug) triage labels Sep 27, 2021
@sarabala1979
Copy link
Member

@zorulo I am not able to reproduce this issue. I can see the resource got an update on wait container from configmap. Please make sure your config map is parsed correctly, which you can see in the controller log.

image
configmap:
image

@zorulo
Copy link
Contributor Author

zorulo commented Sep 28, 2021

@sarabala1979 Thanks for looking into it. I can see the resources applied in our configmap when running kubectl describe cm <configmap>, output similar to what you showed above. Additionally, we are essentially using the same configmap for both 2.11.x and other versions, but it only seems to show up for 2.11.x.

@sarabala1979
Copy link
Member

@zorulo 2.11.x+ is working as expected. Is my understanding correct?

@alexec
Copy link
Contributor

alexec commented Oct 5, 2021

I spend 10m looking at the code and I did not see any likely culprits.

Can you please check to see what configuration the controller is logging on start-up and that it matches what you expect?

@zorulo
Copy link
Contributor Author

zorulo commented Oct 5, 2021

@sarabala1979 Apologies, missed this, yes it seems fine on 2.11.x versions.

@alexec For further context we are using flux2 with kustomize to deploy/setup our clusters, including automatic deployment of argo workflows. We use the install.yaml downloaded from the argo github.

Sorry for truncated logs, trying to retain anonymity, but running kubectl logs -n argo <workflow-controller-pod> reveals pretty much the same thing on startup for both versions, save the executor which was an intentional change (keeping it at docker doesn't make a difference).

2.11.8:

\ncontainerRuntimeExecutor: docker\nexecutor:\n  name: \"\"\n  resources:\n    requests:\n      cpu: \"2\"\n      memory: 4Gi\n

and 3.1.12:

\ncontainerRuntimeExecutor: emissary\nexecutor:\n  name: \"\"\n  resources:\n    requests:\n      cpu: \"2\"\n      memory: 4Gi\n

@alexec
Copy link
Contributor

alexec commented Oct 5, 2021

Could you share the raw config map?

@zorulo
Copy link
Contributor Author

zorulo commented Oct 5, 2021

@alexec It essentially looks like this, truncated again just for anonymity but we also have artifactRepository, persistence, and workflowDefaults specifications, all working as expected.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
data:
  parallelism: "50"

  resourceRateLimit: |
    limit: 500
    burst: 100

  nodeEvents: |
    enabled: true

  containerRuntimeExecutor: emissary

  executor: |
    resources:
      requests:
        cpu: 2
        memory: 4Gi

@alexec
Copy link
Contributor

alexec commented Oct 6, 2021

Ah. I think I know what the issue is. You're only specific requests, not limits. I can see in the code you if don't specify limits, then it ignores it :(

return ctr != nil && len(ctr.Resources.Limits) != 0

Would you like to submit the fix? It should be really easy.

@zorulo
Copy link
Contributor Author

zorulo commented Oct 6, 2021

@alexec aha! Yes, can confirm that adding limits to our configmap has the resource requests showing up now:

Containers:
  wait:
    Container ID:  <container-id>
    Image:         <image>
    Image ID:      <image-id>
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
      --loglevel
      info
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 06 Oct 2021 10:16:26 -0700
      Finished:     Wed, 06 Oct 2021 10:16:27 -0700
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:     2
      memory:  4Gi

We definitely don't want to impose limits though; we work with extremely large and unpredictable datasets, which is why we have generous resource requests to begin with.

To be honest I'm not totally sure what submitting a fix entails, so would prefer someone on your side doing it.

@alexec alexec added good first issue Good for newcomers and removed type/regression Regression from previous behavior (a specific type of bug) labels Oct 6, 2021
@alexec
Copy link
Contributor

alexec commented Oct 6, 2021

It is not hard to create a fix, here is the guide:

https://github.com/argoproj/argo-workflows/blob/master/docs/CONTRIBUTING.md

If you don't have the time, then we'll see who is available at the next weekly.

@zorulo
Copy link
Contributor Author

zorulo commented Oct 6, 2021

@alexec ah okay I'll take a look and give it a shot then, thanks!

zorulo added a commit to zorulo/argo-workflows that referenced this issue Oct 6, 2021
Signed-off-by: zorulo <artist.swimmer.cheng@gmail.com>
alexec pushed a commit that referenced this issue Oct 7, 2021
Signed-off-by: zorulo <artist.swimmer.cheng@gmail.com>
@sarabala1979 sarabala1979 mentioned this issue Oct 15, 2021
19 tasks
@sarabala1979 sarabala1979 mentioned this issue Oct 19, 2021
19 tasks
sarabala1979 pushed a commit that referenced this issue Oct 19, 2021
Signed-off-by: zorulo <artist.swimmer.cheng@gmail.com>
kriti-sc pushed a commit to kriti-sc/argo-workflows that referenced this issue Oct 24, 2021
…rgoproj#6879)

Signed-off-by: zorulo <artist.swimmer.cheng@gmail.com>
Signed-off-by: kriti-sc <kathuriakriti1@gmail.com>
@alexec alexec mentioned this issue Nov 5, 2021
25 tasks
@sarabala1979 sarabala1979 mentioned this issue Dec 15, 2021
73 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants