Cron Workflows Pending when PDB error occurs #13379

hanseltime · 2024-07-22T16:12:29Z

Pre-requisites

I have double-checked my configuration
I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
I have searched existing issues and could not find a match for this bug
I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

This seems related to #13373, but I am reporting it since it seems to explicitly have a problem when interacting with PodDisruptionBudgets and single concurrency run workflows (effectively making them silent killers).

When applying Pod Disruption Budget to a CronWorkflow, occasionally, the workflow will silently continue to run even after the workflow failed due to pod deletion via some means outside of Pod Disruption Budget context (i.e. EKS swaps a node, etc.).

When it happens, some of the task results are left over as well as the pod disruption budget, and the argo cron job does not resolve itself. Since these workflows have concurrencyPolicy: Forbid, this means that we can't run any jobs again.

In logs, this looks like:

Created PDB resource for workflow.

With no subsequent Deleted PDB resource for workflow. message after.

When reviewing the cluster controller logs doe 3.5.7, there is a failure for these missing delete cases of the following:

Error syncing PodDisruptionBudget <worfklow pdb>, requeuing: Operation cannot be fulfilled on poddisruptionbudgets.policy "<worfklow pdb>": the object has been modified; please apply your changes to the latest version and try again

However, when upgraded to 3.5.8, these logs did not show up in the controller. There are some "could not find node" messages, but I don't think that's the issue since the logs are both before and after the deletion time and they line up with a reported issue around noisy logging for tasks.

I looked over the code here:

argo-workflows/workflow/controller/operator.go

Lines 3791 to 3807 in 52cca7e

    
           func (woc *wfOperationCtx) deletePDBResource(ctx context.Context) error { 
        
           	if woc.execWf.Spec.PodDisruptionBudget == nil { 
        
           		return nil 
        
           	} 
        
           	err := waitutil.Backoff(retry.DefaultRetry, func() (bool, error) { 
        
           		err := woc.controller.kubeclientset.PolicyV1().PodDisruptionBudgets(woc.wf.Namespace).Delete(ctx, woc.wf.Name, metav1.DeleteOptions{}) 
        
           		if apierr.IsNotFound(err) { 
        
           			return true, nil 
        
           		} 
        
           		return !errorsutil.IsTransientErr(err), err 
        
           	}) 
        
           	if err != nil { 
        
           		woc.log.WithField("err", err).Error("Unable to delete PDB resource for workflow.") 
        
           		return err 
        
           	} 
        
           	woc.log.Info("Deleted PDB resource for workflow.") 
        
           	return nil

The deletePDB should have arrived at a message of "Unable to delete PDB resource for workflow" and by the calling logic, should have set the workflow phase to "Error". Instead, the workflow stays in "Running" and there is no log.

I have looked as best as I can through the source and logs, but I can't find anything glaring that would indicate a failure except that maybe there was a crash (which the Backoff package handles and then re-crashes). The controller never restarted and all other jobs seem to have good functionality through this issue though (and I would've assume there would be a crash log - but maybe that goes to a different location).

Please note that we do not have this problem if we do not have pod disruption budgets on the workflow. There is no problem with the workflow cleaning itself up and then restarting

Version(s)

v3.5.7, v3.5.8

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

# IMPORTANT - this only happens in our prod EKS cluster, so I don't have a minimal workflow for reproduction.  I have not been able to get it to happen in smaller scale envs.
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
spec:
  workflowSpec:
    templates:
      - name: long-task
        inputs: {}
        outputs: {}
        metadata: {}
        steps:
          - - name: long-task-to-do
              arguments: {}
              templateRef:
                name: base-workflow
                template: long-task-to-do
      - name: queue-bot-tasks-exit-handler
        inputs: {}
        outputs: {}
        metadata: {}
        steps:
          - - name: queue-bot-tasks-exit-handler
              arguments: {}
              templateRef:
                name: base-workflow
                template: exit-handler
    entrypoint: long-task
    arguments: {}
    onExit: exit-handler
    # This is a very long run process
    activeDeadlineSeconds: 21600
    podDisruptionBudget:
      minAvailable: 9999
  schedule: '*/15 * * * *'
  concurrencyPolicy: Forbid
  startingDeadlineSeconds: 0

Logs from the workflow controller

time="2024-07-22T06:30:00.083Z" level=info msg="Processing workflow" Phase= ResourceVersion=272289416 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.964Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=0 workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.980Z" level=info msg="Created PDB resource for workflow." namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.980Z" level=info msg="Updated phase  -> Running" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.994Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.994Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.994Z" level=info msg="Steps node queue-bot-tasks-1721629800 initialized Running" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.994Z" level=info msg="StepGroup node queue-bot-tasks-1721629800-584234400 initialized Running" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.994Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.995Z" level=info msg="Steps node queue-bot-tasks-1721629800-3270971714 initialized Running" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.995Z" level=info msg="StepGroup node queue-bot-tasks-1721629800-1693693764 initialized Running" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.995Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.995Z" level=info msg="Plugin node queue-bot-tasks-1721629800-3549572310 initialized Pending" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.995Z" level=info msg="Workflow step group node queue-bot-tasks-1721629800-1693693764 not yet completed" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.995Z" level=info msg="Workflow step group node queue-bot-tasks-1721629800-584234400 not yet completed" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.995Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T06:30:00.995Z" level=info msg="Creating TaskSet" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
... Since this ran for 3 hours, I've truncated these
time="2024-07-22T13:59:05.997Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.359Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.360Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.360Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.360Z" level=info msg=updateAgentPodStatus namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.360Z" level=info msg=assessAgentPodStatus namespace=argo-workflows-janus-api podName=queue-bot-tasks-1721629800-1340600742-agent
time="2024-07-22T14:03:57.360Z" level=error msg="was unable to obtain node for queue-bot-tasks-1721629800-2166136261" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.361Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.361Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.361Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.361Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.361Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:03:57.392Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.363Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.366Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.366Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.366Z" level=info msg=updateAgentPodStatus namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.366Z" level=info msg=assessAgentPodStatus namespace=argo-workflows-janus-api podName=queue-bot-tasks-1721629800-1340600742-agent
time="2024-07-22T14:12:44.366Z" level=error msg="was unable to obtain node for queue-bot-tasks-1721629800-2166136261" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.366Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.366Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.366Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.366Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.366Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:12:44.396Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:05.970Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:05.971Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:05.971Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:05.971Z" level=info msg=updateAgentPodStatus namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:05.971Z" level=info msg=assessAgentPodStatus namespace=argo-workflows-janus-api podName=queue-bot-tasks-1721629800-1340600742-agent
time="2024-07-22T14:19:05.971Z" level=error msg="was unable to obtain node for queue-bot-tasks-1721629800-2166136261" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:05.972Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:05.972Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:05.972Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:05.972Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:05.972Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:19:06.001Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.369Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.370Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.370Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.370Z" level=info msg=updateAgentPodStatus namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.370Z" level=info msg=assessAgentPodStatus namespace=argo-workflows-janus-api podName=queue-bot-tasks-1721629800-1340600742-agent
time="2024-07-22T14:21:48.371Z" level=error msg="was unable to obtain node for queue-bot-tasks-1721629800-2166136261" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.371Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.371Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.371Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.371Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.371Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:21:48.401Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.372Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.374Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.374Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.374Z" level=info msg=updateAgentPodStatus namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.374Z" level=info msg=assessAgentPodStatus namespace=argo-workflows-janus-api podName=queue-bot-tasks-1721629800-1340600742-agent
time="2024-07-22T14:30:16.374Z" level=error msg="was unable to obtain node for queue-bot-tasks-1721629800-2166136261" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.374Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.374Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.374Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.374Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.374Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:30:16.402Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.378Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.380Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.380Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.380Z" level=info msg=updateAgentPodStatus namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.380Z" level=info msg=assessAgentPodStatus namespace=argo-workflows-janus-api podName=queue-bot-tasks-1721629800-1340600742-agent
time="2024-07-22T14:37:19.380Z" level=error msg="was unable to obtain node for queue-bot-tasks-1721629800-2166136261" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.380Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.380Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.380Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.381Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.381Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:37:19.412Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:05.971Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:05.972Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:05.972Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:05.972Z" level=info msg=updateAgentPodStatus namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:05.972Z" level=info msg=assessAgentPodStatus namespace=argo-workflows-janus-api podName=queue-bot-tasks-1721629800-1340600742-agent
time="2024-07-22T14:39:05.972Z" level=error msg="was unable to obtain node for queue-bot-tasks-1721629800-2166136261" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:05.973Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:05.973Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:05.973Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:05.973Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:05.973Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:39:06.003Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:41:35.112Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:41:35.114Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:41:35.114Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:41:35.114Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:41:35.114Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:41:35.114Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:41:35.115Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:41:35.115Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:41:35.143Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:44:48.386Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:44:48.389Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:44:48.389Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:44:48.389Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:44:48.389Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:44:48.389Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:44:48.389Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:44:48.389Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:44:48.421Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:54:31.390Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:54:31.392Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:54:31.392Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:54:31.392Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:54:31.392Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:54:31.392Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:54:31.392Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:54:31.392Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:54:31.423Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:59:05.971Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:59:05.973Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:59:05.973Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:59:05.973Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:59:05.973Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:59:05.973Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:59:05.974Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:59:05.974Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T14:59:06.006Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:04:25.398Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:04:25.401Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:04:25.401Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:04:25.401Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:04:25.401Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:04:25.401Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:04:25.401Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:04:25.401Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:04:25.430Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:11:21.425Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:11:21.427Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:11:21.427Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:11:21.427Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:11:21.427Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:11:21.427Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:11:21.427Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:11:21.427Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:11:21.457Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:18:22.439Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:18:22.441Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:18:22.441Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:18:22.441Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:18:22.441Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:18:22.441Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:18:22.441Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:18:22.441Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:18:22.478Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:19:05.971Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:19:05.973Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:19:05.973Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:19:05.973Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:19:05.973Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:19:05.973Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:19:05.973Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:19:05.973Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:19:06.009Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:24:01.443Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:24:01.445Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:24:01.445Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:24:01.445Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:24:01.445Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:24:01.445Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:24:01.445Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:24:01.445Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:24:01.478Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:30:44.446Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=272430477 namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:30:44.448Z" level=info msg="Task-result reconciliation" namespace=argo-workflows-janus-api numObjs=1 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:30:44.448Z" level=info msg="task-result changed" namespace=argo-workflows-janus-api nodeID=queue-bot-tasks-1721629800-1132069076 workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:30:44.448Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:30:44.448Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:30:44.448Z" level=info msg="Running OnExit handler: queue-bot-tasks-exit-handler" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:30:44.448Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:30:44.448Z" level=info msg=reconcileAgentPod namespace=argo-workflows-janus-api workflow=queue-bot-tasks-1721629800
time="2024-07-22T15:30:44.473Z" level=info msg="Workflow update successful" namespace=argo-workflows-janus-api phase=Running resourceVersion=272430477 workflow=queue-bot-tasks-1721629800

Logs from in your workflow's wait container

time=\"2024-07-22T09:36:54.283Z\" level=info msg=\"Alloc=7278 TotalAlloc=25931 Sys=19301 NumGC=96 Goroutines=8\"
time=\"2024-07-22T09:41:54.281Z\" level=info msg=\"Alloc=7345 TotalAlloc=26265 Sys=19301 NumGC=98 Goroutines=8\"
time=\"2024-07-22T09:46:54.281Z\" level=info msg=\"Alloc=7277 TotalAlloc=26597 Sys=19301 NumGC=101 Goroutines=8\"
time=\"2024-07-22T09:51:54.281Z\" level=info msg=\"Alloc=7345 TotalAlloc=26935 Sys=19301 NumGC=103 Goroutines=8\"
time=\"2024-07-22T09:56:54.281Z\" level=info msg=\"Alloc=7278 TotalAlloc=27269 Sys=19301 NumGC=106 Goroutines=8\"
time=\"2024-07-22T10:01:54.282Z\" level=info msg=\"Alloc=7346 TotalAlloc=27603 Sys=19301 NumGC=108 Goroutines=8\"
time=\"2024-07-22T10:06:54.282Z\" level=info msg=\"Alloc=7275 TotalAlloc=27936 Sys=19301 NumGC=111 Goroutines=8\"
time=\"2024-07-22T10:11:54.282Z\" level=info msg=\"Alloc=7349 TotalAlloc=28278 Sys=19301 NumGC=113 Goroutines=8\"
time=\"2024-07-22T10:16:54.282Z\" level=info msg=\"Alloc=7271 TotalAlloc=28612 Sys=19301 NumGC=116 Goroutines=8\"
time=\"2024-07-22T10:21:54.282Z\" level=info msg=\"Alloc=7346 TotalAlloc=28962 Sys=19301 NumGC=118 Goroutines=8\"
time=\"2024-07-22T10:26:54.281Z\" level=info msg=\"Alloc=7278 TotalAlloc=29294 Sys=19301 NumGC=121 Goroutines=8\"
time=\"2024-07-22T10:31:54.282Z\" level=info msg=\"Alloc=7345 TotalAlloc=29629 Sys=19301 NumGC=123 Goroutines=8\"
time=\"2024-07-22T10:36:54.282Z\" level=info msg=\"Alloc=7279 TotalAlloc=29962 Sys=19301 NumGC=126 Goroutines=8\"
time=\"2024-07-22T10:41:54.281Z\" level=info msg=\"Alloc=7346 TotalAlloc=30296 Sys=19301 NumGC=128 Goroutines=8\"
time=\"2024-07-22T10:46:54.282Z\" level=info msg=\"Alloc=7279 TotalAlloc=30629 Sys=19301 NumGC=131 Goroutines=8\"
time=\"2024-07-22T10:51:54.282Z\" level=info msg=\"Alloc=7346 TotalAlloc=30964 Sys=19301 NumGC=133 Goroutines=8\"
time=\"2024-07-22T10:56:54.282Z\" level=info msg=\"Alloc=7279 TotalAlloc=31301 Sys=19301 NumGC=136 Goroutines=8\"
time=\"2024-07-22T11:01:54.282Z\" level=info msg=\"Alloc=7346 TotalAlloc=31635 Sys=19301 NumGC=138 Goroutines=8\"
time=\"2024-07-22T11:06:54.281Z\" level=info msg=\"Alloc=7279 TotalAlloc=31968 Sys=19301 NumGC=141 Goroutines=8\"
time=\"2024-07-22T11:11:54.282Z\" level=info msg=\"Alloc=7346 TotalAlloc=32303 Sys=19301 NumGC=143 Goroutines=8\"
time=\"2024-07-22T11:16:54.282Z\" level=info msg=\"Alloc=7279 TotalAlloc=32635 Sys=19301 NumGC=146 Goroutines=8\"
time=\"2024-07-22T11:21:54.281Z\" level=info msg=\"Alloc=7346 TotalAlloc=32969 Sys=19301 NumGC=148 Goroutines=8\"
time=\"2024-07-22T11:26:54.282Z\" level=info msg=\"Alloc=7279 TotalAlloc=33311 Sys=19301 NumGC=151 Goroutines=8\"
time=\"2024-07-22T11:31:54.282Z\" level=info msg=\"Alloc=7344 TotalAlloc=33646 Sys=19301 NumGC=153 Goroutines=8\"
time=\"2024-07-22T11:33:04.849Z\" level=info msg=\"Deadline monitor stopped\"
time=\"2024-07-22T11:33:04.849Z\" level=info msg=\"stopping progress monitor (context done)\" error=\"context canceled\"
time=\"2024-07-22T11:33:05.198Z\" level=warning msg=\"Non-transient error: context canceled\"
time=\"2024-07-22T11:33:05.198Z\" level=info msg=\"Main container completed\" error=\"context canceled\"
time=\"2024-07-22T11:33:05.198Z\" level=info msg=\"No Script output reference in workflow. Capturing script output ignored\"
time=\"2024-07-22T11:33:05.198Z\" level=info msg=\"No output parameters\"
time=\"2024-07-22T11:33:05.198Z\" level=info msg=\"No output artifacts\"
time=\"2024-07-22T11:33:05.214Z\" level=info msg=\"S3 Save path: /tmp/argo/outputs/logs/main.log, key: queue-bot-tasks-1721629800/queue-bot-tasks-1721629800-execute-janus-api-script-1132069076/main.log\"
time=\"2024-07-22T11:33:05.225Z\" level=info msg=\"Creating minio client using AWS SDK credentials\"

The text was updated successfully, but these errors were encountered:

hanseltime added the type/bug label Jul 22, 2024

agilgur5 added area/controller Controller issues, panics P3 Low priority labels Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cron Workflows Pending when PDB error occurs #13379

Cron Workflows Pending when PDB error occurs #13379

hanseltime commented Jul 22, 2024

Cron Workflows Pending when PDB error occurs #13379

Cron Workflows Pending when PDB error occurs #13379

Comments

hanseltime commented Jul 22, 2024

Pre-requisites

What happened? What did you expect to happen?

Version(s)

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

Logs from the workflow controller

Logs from in your workflow's wait container