Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex DAG of container set nodes always failed #12997

Closed
3 of 4 tasks
GlobeFishNG opened this issue Apr 30, 2024 · 1 comment · Fixed by #13048
Closed
3 of 4 tasks

Complex DAG of container set nodes always failed #12997

GlobeFishNG opened this issue Apr 30, 2024 · 1 comment · Fixed by #13048

Comments

@GlobeFishNG
Copy link

GlobeFishNG commented Apr 30, 2024

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

A complex workflow has below structure.

  1. The outermost layer of the workflow is a dag.
  2. The dag dependencies are quite complicated. (40+ nodes and several nodes have more than 5 dependencies)
  3. Each dag node is a container set.
  4. Each container set has a simple DAG. (prepare-inputs -> main -> prepare-outputs)

The workflow below failed every time. When I changed the inner dag from containers to dag or step groups, it would succeed.

Expectation: Container sets works well as dag/step groups when it is used as node nested in the complex workflow.
What happened:

  • The workflows failed with error info below.
    &{0x3a297a0 map[namespace:pipeline workflow:pipeline-test-point-simplefied-g5gbq] 2024-04-30 03:46:14.425189512 +0000 UTC m=+158960.631405283 panic <nil> was unable to obtain node for pipeline-test-point-simplefied-g5gbq-2664608158 <nil> <nil> }
  • The DAG showed in GUI was wrong with an invalid dependencies tree. For example, I and J has the same dependencies but they were connected as if J depended on I, as below.
    image
    • So far as I tested, DAGs showed in the Argo GUI were wrong even for some very simple container set. I suspected that if the crash were some how connected to such wrong DAG behavior.

Version

latest (3.5.6)

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: pipeline-test-point-simplefied
spec:
  entrypoint: pipeline
  activeDeadlineSeconds: 172800
  arguments:
    parameters:
    - name: runNums
      value: '["006"]'
  templates:
  - name: whalesay # name of the template
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["hello world"]
      resources: # limit the resources
        limits:
          memory: 32Mi
          cpu: 100m
  - name: whalesay-step-groups
    steps:
    - - name: prepare-inputs
        template: whalesay
    - - name: main
        template: whalesay
    - - name: prepare-outputs
        template: whalesay
  - name: whalesay-dag
    dag:
      tasks:
      - name: prepare-inputs
        template: whalesay
      - name: main
        depends: prepare-inputs
        template: whalesay
      - name: prepare-outputs
        depends: main
        template: whalesay
  - name: whalesay-container-set # name of the template
    containerSet:
      containers:
      - name: prepare-inputs
        image: docker/whalesay
        command: [cowsay]
        args: ["hello world"]
        resources: # limit the resources
          limits:
            memory: 32Mi
            cpu: 100m
      - name: main
        dependencies:
        - prepare-inputs
        image: docker/whalesay
        command: [cowsay]
        args: ["hello world"]
        resources: # limit the resources
          limits:
            memory: 32Mi
            cpu: 100m
      - name: prepare-outputs
        dependencies:
        - main
        image: docker/whalesay
        command: [cowsay]
        args: ["hello world"]
        resources: # limit the resources
          limits:
            memory: 32Mi
            cpu: 100m
  - name: pipeline
    dag:
      tasks:
      - name: A
        template: whalesay-container-set
      - name: B
        depends: A.Succeeded
        template: whalesay-container-set
      - name: C
        depends: B.Succeeded
        when: 'false'
        template: whalesay-container-set
      - name: D
        depends: C.Succeeded
        template: whalesay-container-set
      - name: E
        depends: D.Succeeded
        template: whalesay-container-set
      - name: F
        depends: D.Succeeded
        template: whalesay-container-set
      - name: G
        depends: E.Succeeded && F.Succeeded
        template: whalesay-container-set
      - name: H
        depends: A.Succeeded && G.Omitted
        template: whalesay-container-set
      - name: I
        depends: H.Succeeded
        template: whalesay-container-set
      - name: J
        depends: H.Succeeded
        template: whalesay-container-set
      - name: K
        depends: A.Succeeded && G.Omitted
        template: whalesay-container-set
      - name: L
        depends: B.Succeeded
        template: whalesay-container-set
      - name: M
        depends: B.Succeeded
        template: whalesay-container-set
      - name: N1
        depends: M.Succeeded && G.Omitted
        template: whalesay-container-set
      - name: O
        depends: N1.Succeeded
        template: whalesay-container-set
      - name: P
        depends: O.Succeeded
        template: whalesay-container-set
      - name: Q
        depends: O.Succeeded
        template: whalesay-container-set
        withParam: '{{workflow.parameters.runNums}}'
      - name: R
        depends: O.Succeeded
        template: whalesay-container-set
      - name: T
        depends: R.Succeeded
        template: whalesay-container-set
      - name: S
        depends: O.Succeeded
        template: whalesay-container-set
      - name: U
        depends: O.Succeeded
        template: whalesay-container-set
      - name: V
        depends: O.Succeeded
        template: whalesay-container-set
      - name: W
        depends: M.Succeeded && G.Omitted
        template: whalesay-container-set
      - name: X
        depends: Q.Succeeded
        template: whalesay-container-set
      - name: Y1
        depends: A.Succeeded && G.Omitted
        template: whalesay-container-set
      - name: Z
        depends: Y1.Succeeded && Q.Succeeded && R.Succeeded && V.Succeeded && U.Succeeded && S.Succeeded
        template: whalesay-container-set
      ### SSR
      - name: SSR-A
        depends: A.Succeeded
        template: whalesay-container-set
      - name: SSR-B
        depends: I.Succeeded && J.Succeeded
        template: whalesay-container-set

      - name: SSR-C
        depends: O.Succeeded
        template: whalesay-container-set
      - name: SSR-D
        depends: P.Succeeded
        template: whalesay-container-set
      - name: SSR-E
        depends: P.Succeeded
        template: whalesay-container-set
      - name: SSR-F
        depends: P.Succeeded
        template: whalesay-container-set
      - name: SSR-G
        depends: I.Succeeded && R.Succeeded
        template: whalesay-container-set
      - name: SSR-H
        depends: I.Succeeded && V.Succeeded
        template: whalesay-container-set
      - name: SSR-I
        depends: I.Succeeded && S.Succeeded
        template: whalesay-container-set
      - name: SSR-J
        depends: I.Succeeded && R.Succeeded
        template: whalesay-container-set
      - name: SSR-K
        depends: I.Succeeded && V.Succeeded
        template: whalesay-container-set
      - name: SSR-L
        depends: I.Succeeded && S.Succeeded
        template: whalesay-container-set
      - name: SSR-M
        depends: W.Succeeded
        template: whalesay-container-set
      - name: SSR-N
        depends: W.Succeeded
        template: whalesay-container-set
      - name: SSR-O
        depends: I.Succeeded && Z.Succeeded
        template: whalesay-container-set
      - name: SSR-P
        depends: I.Succeeded && Z.Succeeded
        template: whalesay-container-set
      - name: SSR-Q
        depends: I.Succeeded && Z.Succeeded
        template: whalesay-container-set

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
@GlobeFishNG
Copy link
Author

Logs from the workflow controller

controller.log

Logs from in your workflow's wait container

error.log

@tczhao tczhao self-assigned this May 1, 2024
agilgur5 pushed a commit that referenced this issue Jul 19, 2024
…3048)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
@agilgur5 agilgur5 added the P3 Low priority label Jul 19, 2024
@agilgur5 agilgur5 added this to the v3.5.x patches milestone Jul 30, 2024
agilgur5 pushed a commit that referenced this issue Jul 30, 2024
…3048)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
(cherry picked from commit a154a93)
Joibel pushed a commit to pipekit/argo-workflows that referenced this issue Sep 19, 2024
Joibel pushed a commit that referenced this issue Sep 20, 2024
…3048)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment