Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow succeeded with some node failed #6834

Closed
book987 opened this issue Sep 30, 2021 · 0 comments · Fixed by #6912
Closed

Workflow succeeded with some node failed #6834

book987 opened this issue Sep 30, 2021 · 0 comments · Fixed by #6912
Assignees
Labels
Milestone

Comments

@book987
Copy link
Contributor

book987 commented Sep 30, 2021

Summary

A workflow succeeded, but actually there's some nodes failed.

Reproduce on v3.1.6, v3.1.13, v3.2.0-rc4

Diagnostics

Currently all nodes organized like this. That's because we get TaskGroup's outbound nodes from their children, but A expand to nothing, it's an empty TaskGroup node, so it connects to TaskGroup B directly. So when C calls connectDependencies, it'll go through A (TaskGroup) -> B (TaskGroup) -> B (DAG), and got nothing, because B (Pod) still Running, at this moment B (DAG) have no outbound node.
Screenshot 2021-09-30 at 2 00 48 PM

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: wrong-wf-status-test-
  namespace: argo
spec:
  entrypoint: main
  serviceAccountName: argo
  templates:
  - container:
      args:
      - -c
      - exit {{inputs.parameters.code}}
      command:
      - /bin/sh
      image: alpine
      name: main
    inputs:
      parameters:
      - name: code
    name: task
  - dag:
      tasks:
      - arguments:
          parameters:
          - name: code
            value: '{{inputs.parameters.code}}'
        name: task
        template: task
    inputs:
      parameters:
      - name: code
    name: task-dag
  - dag:
      tasks:
      - arguments:
          parameters:
          - name: code
            value: "0"
        name: A
        template: task
        withParam: '[]'
      - arguments:
          parameters:
          - name: code
            value: "0"
        dependencies:
        - A
        name: B
        template: task-dag
        withParam: '[{"code": "0"}]'
      - arguments:
          parameters:
          - name: code
            value: "1"
        dependencies:
        - A
        name: C
        template: task-dag
        withParam: '[{"code": "1"}]'
      - arguments:
          parameters:
          - name: code
            value: "0"
        dependencies:
        - B
        - C
        name: D
        template: task
    name: main
status:
  conditions:
  - status: "False"
    type: PodRunning
  - status: "True"
    type: Completed
  finishedAt: "2021-09-30T05:50:49Z"
  nodes:
    wrong-wf-status-test-6btkq:
      children:
      - wrong-wf-status-test-6btkq-175295760
      displayName: wrong-wf-status-test-6btkq
      finishedAt: "2021-09-30T05:50:49Z"
      id: wrong-wf-status-test-6btkq
      name: wrong-wf-status-test-6btkq
      outboundNodes:
      - wrong-wf-status-test-6btkq-259183855
      phase: Succeeded
      progress: 1/1
      resourcesDuration:
        cpu: 4
        memory: 4
      startedAt: "2021-09-30T05:50:39Z"
      templateName: main
      templateScope: local/wrong-wf-status-test-6btkq
      type: DAG
    wrong-wf-status-test-6btkq-175295760:
      boundaryID: wrong-wf-status-test-6btkq
      children:
      - wrong-wf-status-test-6btkq-225628617
      displayName: A
      finishedAt: "2021-09-30T05:50:39Z"
      id: wrong-wf-status-test-6btkq-175295760
      name: wrong-wf-status-test-6btkq.A
      phase: Succeeded
      progress: 1/1
      resourcesDuration:
        cpu: 4
        memory: 4
      startedAt: "2021-09-30T05:50:39Z"
      templateName: task
      templateScope: local/wrong-wf-status-test-6btkq
      type: TaskGroup
    wrong-wf-status-test-6btkq-208850998:
      boundaryID: wrong-wf-status-test-6btkq
      children:
      - wrong-wf-status-test-6btkq-1961584371
      displayName: C
      finishedAt: "2021-09-30T05:50:49Z"
      id: wrong-wf-status-test-6btkq-208850998
      name: wrong-wf-status-test-6btkq.C
      phase: Failed
      progress: 1/1
      resourcesDuration:
        cpu: 4
        memory: 4
      startedAt: "2021-09-30T05:50:39Z"
      templateName: task-dag
      templateScope: local/wrong-wf-status-test-6btkq
      type: TaskGroup
    wrong-wf-status-test-6btkq-225628617:
      boundaryID: wrong-wf-status-test-6btkq
      children:
      - wrong-wf-status-test-6btkq-1528210745
      displayName: B
      finishedAt: "2021-09-30T05:50:49Z"
      id: wrong-wf-status-test-6btkq-225628617
      name: wrong-wf-status-test-6btkq.B
      phase: Succeeded
      progress: 1/1
      resourcesDuration:
        cpu: 4
        memory: 4
      startedAt: "2021-09-30T05:50:39Z"
      templateName: task-dag
      templateScope: local/wrong-wf-status-test-6btkq
      type: TaskGroup
    wrong-wf-status-test-6btkq-259183855:
      boundaryID: wrong-wf-status-test-6btkq
      displayName: D
      finishedAt: "2021-09-30T05:50:49Z"
      id: wrong-wf-status-test-6btkq-259183855
      message: 'omitted: depends condition not met'
      name: wrong-wf-status-test-6btkq.D
      phase: Omitted
      startedAt: "2021-09-30T05:50:49Z"
      templateName: task
      templateScope: local/wrong-wf-status-test-6btkq
      type: Skipped
    wrong-wf-status-test-6btkq-1528210745:
      boundaryID: wrong-wf-status-test-6btkq
      children:
      - wrong-wf-status-test-6btkq-3682970524
      displayName: B(0:code:0)
      finishedAt: "2021-09-30T05:50:49Z"
      id: wrong-wf-status-test-6btkq-1528210745
      inputs:
        parameters:
        - name: code
          value: "0"
      name: wrong-wf-status-test-6btkq.B(0:code:0)
      outboundNodes:
      - wrong-wf-status-test-6btkq-3682970524
      phase: Succeeded
      progress: 1/1
      resourcesDuration:
        cpu: 4
        memory: 4
      startedAt: "2021-09-30T05:50:39Z"
      templateName: task-dag
      templateScope: local/wrong-wf-status-test-6btkq
      type: DAG
    wrong-wf-status-test-6btkq-1961584371:
      boundaryID: wrong-wf-status-test-6btkq
      children:
      - wrong-wf-status-test-6btkq-3101698926
      displayName: C(0:code:1)
      finishedAt: "2021-09-30T05:50:49Z"
      id: wrong-wf-status-test-6btkq-1961584371
      inputs:
        parameters:
        - name: code
          value: "1"
      name: wrong-wf-status-test-6btkq.C(0:code:1)
      outboundNodes:
      - wrong-wf-status-test-6btkq-3101698926
      phase: Failed
      progress: 1/1
      resourcesDuration:
        cpu: 4
        memory: 4
      startedAt: "2021-09-30T05:50:39Z"
      templateName: task-dag
      templateScope: local/wrong-wf-status-test-6btkq
      type: DAG
    wrong-wf-status-test-6btkq-3101698926:
      boundaryID: wrong-wf-status-test-6btkq-1961584371
      children:
      - wrong-wf-status-test-6btkq-259183855
      displayName: task
      finishedAt: "2021-09-30T05:50:44Z"
      hostNodeName: k3d-argo-server-0
      id: wrong-wf-status-test-6btkq-3101698926
      inputs:
        parameters:
        - name: code
          value: "1"
      message: Error (exit code 1)
      name: wrong-wf-status-test-6btkq.C(0:code:1).task
      outputs:
        exitCode: "1"
      phase: Failed
      progress: 1/1
      resourcesDuration:
        cpu: 4
        memory: 4
      startedAt: "2021-09-30T05:50:39Z"
      templateName: task
      templateScope: local/wrong-wf-status-test-6btkq
      type: Pod
    wrong-wf-status-test-6btkq-3682970524:
      boundaryID: wrong-wf-status-test-6btkq-1528210745
      children:
      - wrong-wf-status-test-6btkq-259183855
      displayName: task
      finishedAt: "2021-09-30T05:50:44Z"
      hostNodeName: k3d-argo-server-0
      id: wrong-wf-status-test-6btkq-3682970524
      inputs:
        parameters:
        - name: code
          value: "0"
      name: wrong-wf-status-test-6btkq.B(0:code:0).task
      outputs:
        exitCode: "0"
      phase: Succeeded
      progress: 1/1
      resourcesDuration:
        cpu: 4
        memory: 4
      startedAt: "2021-09-30T05:50:39Z"
      templateName: task
      templateScope: local/wrong-wf-status-test-6btkq
      type: Pod
  phase: Succeeded
  progress: 2/2
  resourcesDuration:
    cpu: 8
    memory: 8
  startedAt: "2021-09-30T05:50:39Z"

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@alexec alexec added this to the v3.1 milestone Oct 6, 2021
@alexec alexec linked a pull request Oct 15, 2021 that will close this issue
@alexec alexec removed the triage label Oct 15, 2021
iven added a commit to iven/argo-workflows that referenced this issue Oct 15, 2021
Signed-off-by: Iven Hsu <ivenvd@gmail.com>
alexec pushed a commit that referenced this issue Oct 18, 2021
Signed-off-by: Iven Hsu <ivenvd@gmail.com>
@sarabala1979 sarabala1979 mentioned this issue Oct 21, 2021
24 tasks
kriti-sc pushed a commit to kriti-sc/argo-workflows that referenced this issue Oct 24, 2021
Signed-off-by: Iven Hsu <ivenvd@gmail.com>
Signed-off-by: kriti-sc <kathuriakriti1@gmail.com>
@alexec alexec mentioned this issue Nov 5, 2021
25 tasks
alexec pushed a commit that referenced this issue Nov 17, 2021
Signed-off-by: Iven Hsu <ivenvd@gmail.com>
@sarabala1979 sarabala1979 mentioned this issue Dec 15, 2021
73 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants