Break and reentry of dynamic dependency matching. #1189

BoPeng · 2019-01-26T14:52:50Z

Right now, if we cannot determine the dependency of a step, we set its input and/or depends to undetermined and execute the step anyway,

input: 'index.bam'
depends: _input.with_suffix('.bam.bai')

Then if the dependency does not exit, we raise an exception UnknownTarget and exit the step. The master process received the exception and tries to add the dependency dynamically, and will either fail if dependency cannot be met, or execute the step. The original step will be re-tried, and perhaps re-break.

This retry behavior is not ideal because SoS allows the execution of statements before input:. Therefore the step could be like

function() 
input: 'index.bam'
depends: _input.with_suffix('.bam.bai')

where function is computationally intensive, should be executed only once. There is even a possibility of

depends: depends involving random.random(...)

so that the dependency will be different each time the step is executed, so our break-and-reentry style will not work at all. This is also related to #1186 when we cannot handle dynamic dependency in depends. For that situation we should not break the step because we do not know if the step has dependency, and the step should just continue if the dependency does not exist.

So, instead of treating unmet dependency as an exception and stop the step,, there is a possibility of sending the dependency to the master process as a message, and wait for the reply from the master process. That is to say, we

enter a step
send unmet dependency to master and wait
master resolves dependency, execute the step, even entire workflow, to resolve the dependency
master replies the step with either success or failure.
step continues with met dependency

The advantage is that there will be no reentry, but the disadvantage is that there can potentially be hundreds of processes waiting for its dependency to be met (thinking of a large purely dynamic DAG where all nodes become a waiting process). So in the end this is related to #1056 and will make it worse.

The text was updated successfully, but these errors were encountered:

gaow · 2019-01-26T14:56:21Z

master resolves dependency, execute the step, even entire workflow, to resolve the dependency

This means single-process procedure? I guess it would not be an issue if all workers can also be re-assigned to take care of this (in the ideal world?) because those are computations that the workflow have to complete anyways, one way or another.

BoPeng · 2019-01-26T15:03:56Z

No this is not a performance issue. Looking from the master's eye, there are a bunch of workers, and some are waiting for their dependencies to be met. It will adjust the DAG, execute more workers, and let the requesting worker know that his requests are met before he could continue. The problem here is that there can be a lot of waiting processes when the DAG grow larger and larger.

This is why SoS tries really hard to analyze steps and build the DAG so that most steps could be run with met dependencies. But we cannot resolve all cases and have to resort to dynamic dependency checking from time to time.

BoPeng · 2019-01-26T16:43:34Z

Just to illustrate the point, running

[BAI: provides='{filename}.bam.bai']
_output.touch()

[BAM]
output: 'a.bam'
_output.touch()

[default]
input: 'a.bam'
depends: _input.with_suffix('.bam.bai')

would generate

INFO: Running BAM:
INFO: output:   a.bam
INFO: Running default:
INFO: Target unavailable: a.bam.bai
INFO: Running BAI:
INFO: output:   a.bam.bai
INFO: Running default:
INFO: Workflow default (ID=66be5587886ffbdf) is executed successfully with 3 completed steps.

So BAM is run first because the dependency is static. Then default is run with undetermined depends, a.bam.bai does not exist so default is terminated. BAI is run to generate a.bam.bai, and then default is run again, this time with both dependencies met.

BoPeng · 2019-01-26T20:22:42Z

OK, with the last patch, the behavior of the example becomes

INFO: Running BAM:
INFO: output:   a.bam
INFO: Running default:
INFO: Target unavailable: a.bam.bai
INFO: Running BAI:
INFO: output:   a.bam.bai
INFO: Workflow default (ID=66be5587886ffbdf) is executed successfully with 3 completed steps.

That is to say, default will be run only once.

Some tests fail and I do not know the exact side effect of this change, but we will see how this goes.

BoPeng · 2019-01-27T20:24:23Z

There can be negative impact on performance but I believe that this is the "correct" way to handle dynamic DAG. The performance issue could be resolved by

Find some way to deal with idle processes during execution #1056, although this does not look easy.
Try harder to resolve targets statically. For example, if input is statically defined, maybe depends: _input.with_suffix could be resolved statically as well. Right now they are resolved separately.

BoPeng mentioned this issue Jan 26, 2019

A confusing error message (and possibly confusing behavior?) #1190

Closed

BoPeng pushed a commit that referenced this issue Jan 26, 2019

Stop using UnknownTarget in actions #1189

570fa75

BoPeng pushed a commit that referenced this issue Jan 26, 2019

Stop using UnknownTarget in actions #1189

739b3f9

BoPeng pushed a commit that referenced this issue Jan 26, 2019

Stop using UnknownTarget in task executor #1189

b26bea8

BoPeng pushed a commit that referenced this issue Jan 26, 2019

Stop using UnknownTarget in one place of step executor #1189

6cca6a1

BoPeng pushed a commit that referenced this issue Jan 26, 2019

Prepare to send UnknownTarget not as an exception #1189

9770fe4

BoPeng pushed a commit that referenced this issue Jan 26, 2019

Fix zap test, not sure how to solve it #1189

a7d3b72

BoPeng pushed a commit that referenced this issue Jan 26, 2019

Pending instead of break and retry for unknown target #1189

5575382

BoPeng pushed a commit that referenced this issue Jan 27, 2019

Fix dynamic depends from nested workflow #1189

b3d554a

BoPeng pushed a commit that referenced this issue Jan 27, 2019

Fix nested dynamic depends #1189

191b96a

BoPeng pushed a commit that referenced this issue Jan 27, 2019

Fix reporting of non-existence target #1189

622091a

BoPeng closed this as completed Jan 27, 2019

BoPeng pushed a commit that referenced this issue Jan 27, 2019

Handled removed target from workflow executor #1189

226320f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break and reentry of dynamic dependency matching. #1189

Break and reentry of dynamic dependency matching. #1189

BoPeng commented Jan 26, 2019 •

edited

Loading

gaow commented Jan 26, 2019

BoPeng commented Jan 26, 2019 •

edited

Loading

BoPeng commented Jan 26, 2019 •

edited

Loading

BoPeng commented Jan 26, 2019

BoPeng commented Jan 27, 2019

Break and reentry of dynamic dependency matching. #1189

Break and reentry of dynamic dependency matching. #1189

Comments

BoPeng commented Jan 26, 2019 • edited Loading

gaow commented Jan 26, 2019

BoPeng commented Jan 26, 2019 • edited Loading

BoPeng commented Jan 26, 2019 • edited Loading

BoPeng commented Jan 26, 2019

BoPeng commented Jan 27, 2019

BoPeng commented Jan 26, 2019 •

edited

Loading

BoPeng commented Jan 26, 2019 •

edited

Loading

BoPeng commented Jan 26, 2019 •

edited

Loading