You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running workflows in parallel with caching enabled, there is an observed race condition where WorkflowDefinitionActivity instances fail to execute properly due to internal state changes. This issue particularly arises when the cached workflow definitions expire while a workflow instance is still executing, leading to discrepancies between the workflow graph in the execution context and newly materialized workflow definition objects.
Problem Details
During the execution of workflows:
A WorkflowExecutionContext is created.
This context initializes a Graph of ActivityNodes.
Each ActivityNode references an IActivity instance derived from the Workflow object.
Lookups are performed by IActivity instance into the Graph.
Issues occur when:
The cached Workflow entry is evicted and a new Workflow object is materialized during ongoing workflow execution.
This results in WorkflowDefinitionActivity's InitializeAsync method being invoked more than once, causing the Root property to be overwritten with a new instance.
The newly scheduled execution of this Root instance fails as it does not match the existing workflow graph.
Symptoms
Workflows suddenly stop executing with no errors or exceptions.
This problem may still occur with caching disabled if there's code that produces new Workflow objects.
Reproduction and Findings
A component test was designed to reproduce this race condition by sending numerous HTTP requests in parallel while invalidating the cache.
The test confirmed that when WorkflowDefinitionActivity dynamically sets its Root property based on the cached workflow definition, concurrent executions can interfere with each other, leading to execution failures.
Solution
The fix involves:
Ensuring that workflow objects and their collections of activities, once loaded into memory, do not change their internal state.
Throwing an exception when an activity scheduled for execution is not part of the workflow graph.
Optimizing the building of the workflow graph by constructing it concurrently with the Workflow object materialization, thereby reducing the overhead and potential for error in subsequent executions.
The solution has been validated by both the component test and practical applications within the Elsa reference app.
Impact
This fix not only addresses the root cause of the race condition but also improves performance by reducing the frequency of graph constructions per workflow execution.
The text was updated successfully, but these errors were encountered:
Based on the detailed description of the problem, it seems that the race condition is caused by the concurrent execution of workflows where the cached workflow definitions are being evicted and new instances are materialized during the ongoing execution. This leads to inconsistencies between the workflow graph in the execution context and the newly materialized workflow definition objects.
To address this issue, the proposed solution involves making sure that once workflow objects and their collections of activities are loaded into memory, they do not change their internal state. Additionally, an exception should be thrown when an activity scheduled for execution is not part of the workflow graph. By optimizing the building of the workflow graph to occur concurrently with the Workflow object materialization, the overhead and potential for error in subsequent executions are reduced.
This solution should prevent workflows from stopping unexpectedly and improve overall performance by minimizing the need to reconstruct the workflow graph for each execution. The fix has been validated through component testing and practical application within the Elsa reference app, indicating that it effectively resolves the race condition issue.
If you are experiencing this issue, it is recommended to update to the latest version of Elsa where this fix has been implemented. If the problem persists or if you have a different scenario, please provide additional details so that further assistance can be offered.
Description
When running workflows in parallel with caching enabled, there is an observed race condition where
WorkflowDefinitionActivity
instances fail to execute properly due to internal state changes. This issue particularly arises when the cached workflow definitions expire while a workflow instance is still executing, leading to discrepancies between the workflow graph in the execution context and newly materialized workflow definition objects.Problem Details
During the execution of workflows:
WorkflowExecutionContext
is created.Graph
ofActivityNodes
.ActivityNode
references anIActivity
instance derived from the Workflow object.IActivity
instance into the Graph.Issues occur when:
WorkflowDefinitionActivity
'sInitializeAsync
method being invoked more than once, causing theRoot
property to be overwritten with a new instance.Root
instance fails as it does not match the existing workflow graph.Symptoms
Reproduction and Findings
WorkflowDefinitionActivity
dynamically sets itsRoot
property based on the cached workflow definition, concurrent executions can interfere with each other, leading to execution failures.Solution
The fix involves:
The solution has been validated by both the component test and practical applications within the Elsa reference app.
Impact
This fix not only addresses the root cause of the race condition but also improves performance by reducing the frequency of graph constructions per workflow execution.
The text was updated successfully, but these errors were encountered: