Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cwd argument to addWorkflowTask #21

Open
virajbdeshpande opened this issue Jan 3, 2018 · 2 comments
Open

Add cwd argument to addWorkflowTask #21

virajbdeshpande opened this issue Jan 3, 2018 · 2 comments

Comments

@virajbdeshpande
Copy link

Currently, I can use "cwd" argument as shown in the cwd demo for "addTask", but it gives me an error "unexpected keyword argument" if I use it with "addWorkflowTask".

@ctsa
Copy link
Contributor

ctsa commented Jan 4, 2018

The client API docs may help clarify which arguments each method accepts:

http://illumina.github.io/pyflow/WorkflowRunner_API_html_doc/index.html

We could potentially add this for addWorkflowTask, but what are the semantics you're looking for in this case? Could the same thing be accomplished with os.chdir(path) at the top of the added workflow instance?

@virajbdeshpande
Copy link
Author

virajbdeshpande commented Apr 3, 2018

Thanks.

Here is an example use case. You have a dataset of multiple samples (parent workflow) and you want to run multiple analysis for each sample in a different directory (subworkflow). Each analysis gets its own workflow (subsubworkflow) and subdirectory within the sample directory.

Let's say rootdir is the directory where we run the script/parent workflow and the cwd for the subworkflow is path. Then the semantics for the usecase above will be as follows:

  1. if I set cwd=path when calling subworkflow, the working directory for the subsubworkflow should automatically be set to path and not rootdir unless changed by subworkflow using the cwd argument. In short, any subworkflow should be oblivious of rootdir and only inherits cwd from its parent.
  2. It is not directly clear to the user whether it is required to do os.chdir(rootdir) at the end of the subworkflow or will the parent workflow continue to run in rootdir. So having the cwd encoded in an argument clarifies that the user does not need to switch back.

For point (1), in the current version, subsubworkflow still runs in the rootdir even if I do os.chdir(path) within subworkflow.

For point (2), I confirmed that the tasks enter a race condition when using os.chdir on a local run. For example, here are two directory structures that get created by running the pyflow scripts twice:

RUN1: Correct structure
./2015-2802/fastq_cat
./2015-2802
./2015-2799/fastq_cat
./2015-2799

RUN2: Incorrect structure
./2015-2802/fastq_cat
./2015-2802/2015-2799/fastq_cat
./2015-2802/2015-2799
./2015-2802
./2015-2799

Do you think this will be fixed any time soon? Alternatively, I can switch to using absolute paths everywhere within the script and only run shell commands for external tools through addTask(cwd). It is easier to write a bash script to deploy the Pyflow script separately for each sample, but that defeats the purpose of using Pyflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants