Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break and reentry of dynamic dependency matching. #1189

Closed
BoPeng opened this issue Jan 26, 2019 · 5 comments
Closed

Break and reentry of dynamic dependency matching. #1189

BoPeng opened this issue Jan 26, 2019 · 5 comments

Comments

@BoPeng
Copy link
Contributor

BoPeng commented Jan 26, 2019

Right now, if we cannot determine the dependency of a step, we set its input and/or depends to undetermined and execute the step anyway,

input: 'index.bam'
depends: _input.with_suffix('.bam.bai')

Then if the dependency does not exit, we raise an exception UnknownTarget and exit the step. The master process received the exception and tries to add the dependency dynamically, and will either fail if dependency cannot be met, or execute the step. The original step will be re-tried, and perhaps re-break.

This retry behavior is not ideal because SoS allows the execution of statements before input:. Therefore the step could be like

function() 
input: 'index.bam'
depends: _input.with_suffix('.bam.bai')

where function is computationally intensive, should be executed only once. There is even a possibility of

depends: depends involving random.random(...) 

so that the dependency will be different each time the step is executed, so our break-and-reentry style will not work at all. This is also related to #1186 when we cannot handle dynamic dependency in depends. For that situation we should not break the step because we do not know if the step has dependency, and the step should just continue if the dependency does not exist.

So, instead of treating unmet dependency as an exception and stop the step,, there is a possibility of sending the dependency to the master process as a message, and wait for the reply from the master process. That is to say, we

  1. enter a step
  2. send unmet dependency to master and wait
  3. master resolves dependency, execute the step, even entire workflow, to resolve the dependency
  4. master replies the step with either success or failure.
  5. step continues with met dependency

The advantage is that there will be no reentry, but the disadvantage is that there can potentially be hundreds of processes waiting for its dependency to be met (thinking of a large purely dynamic DAG where all nodes become a waiting process). So in the end this is related to #1056 and will make it worse.

@gaow
Copy link
Member

gaow commented Jan 26, 2019

master resolves dependency, execute the step, even entire workflow, to resolve the dependency

This means single-process procedure? I guess it would not be an issue if all workers can also be re-assigned to take care of this (in the ideal world?) because those are computations that the workflow have to complete anyways, one way or another.

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 26, 2019

No this is not a performance issue. Looking from the master's eye, there are a bunch of workers, and some are waiting for their dependencies to be met. It will adjust the DAG, execute more workers, and let the requesting worker know that his requests are met before he could continue. The problem here is that there can be a lot of waiting processes when the DAG grow larger and larger.

This is why SoS tries really hard to analyze steps and build the DAG so that most steps could be run with met dependencies. But we cannot resolve all cases and have to resort to dynamic dependency checking from time to time.

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 26, 2019

Just to illustrate the point, running

[BAI: provides='{filename}.bam.bai']
_output.touch()

[BAM]
output: 'a.bam'
_output.touch()

[default]
input: 'a.bam'
depends: _input.with_suffix('.bam.bai')

would generate

INFO: Running BAM:
INFO: output:   a.bam
INFO: Running default:
INFO: Target unavailable: a.bam.bai
INFO: Running BAI:
INFO: output:   a.bam.bai
INFO: Running default:
INFO: Workflow default (ID=66be5587886ffbdf) is executed successfully with 3 completed steps.

So BAM is run first because the dependency is static. Then default is run with undetermined depends, a.bam.bai does not exist so default is terminated. BAI is run to generate a.bam.bai, and then default is run again, this time with both dependencies met.

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 26, 2019

OK, with the last patch, the behavior of the example becomes

INFO: Running BAM:
INFO: output:   a.bam
INFO: Running default:
INFO: Target unavailable: a.bam.bai
INFO: Running BAI:
INFO: output:   a.bam.bai
INFO: Workflow default (ID=66be5587886ffbdf) is executed successfully with 3 completed steps.

That is to say, default will be run only once.

Some tests fail and I do not know the exact side effect of this change, but we will see how this goes.

BoPeng pushed a commit that referenced this issue Jan 27, 2019
BoPeng pushed a commit that referenced this issue Jan 27, 2019
@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 27, 2019

There can be negative impact on performance but I believe that this is the "correct" way to handle dynamic DAG. The performance issue could be resolved by

  1. Find some way to deal with idle processes during execution #1056, although this does not look easy.
  2. Try harder to resolve targets statically. For example, if input is statically defined, maybe depends: _input.with_suffix could be resolved statically as well. Right now they are resolved separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants