Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the timeout configurable for acquiring the DistributedLock in ContinueRunningWorkflows #4678

Merged

Conversation

gersti07
Copy link

@gersti07 gersti07 commented Dec 4, 2023

Currently in a multi node environment, a running Workflow on one node can block the ResumeWorkflow Startup on another node.

This PR adds an additional configuration setting to be able to have a custom timeout for acquiring the DistributedLock in the ContinueRunningWorkflows Startup, which resumes all running or idling Workflows.

With this setting i can define a short Timeout like 5-10s before i stop acquiring the lock.

If i can not acquire the lock in a short timespan, i assume that the other node is running this workflow, and i can skip resuming this workflow.
This design would potentially leave 2 edgecases unhandled:

  • the lock acquisistion is very slow for different reasons (network problems, unresponsive DB or similar)
  • Potentially: another instance hasn't correctly disposed its DB Lock

I choose the default timeout of 1h like before on the DistributedLockTimeout, to not break stuff, but maybe a default timeout of 5-10 seconds would be good.

see also: #4654

…flows Startup. Skip resuming a running or idling workflow on startup after the configured 'ContinueWorkflowsOnStartupTimeout'
@sfmskywalker sfmskywalker changed the base branch from master to 2.13.x December 13, 2023 08:29
@sfmskywalker
Copy link
Member

That all makes sense, thank you!

@sfmskywalker sfmskywalker merged commit 0e478e5 into elsa-workflows:2.13.x Dec 13, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants