-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implicit depends (data flow) does not re-execute when there is a script change #1186
Comments
Isn't this the same behavior for Perhaps we should check what Makefile and snakemake do in this case. |
Here is the example for make. When I changed a.txt or change the make file between make runs, make does not care
The solution for make was to has an explicit clean step (written by user). |
I think so. My "workaround" is to use Yes I was aware of Makefile behavior, but I always thought Snakemake is different (more advanced). It tunrs out it is not:
save it as
change the echo part to a different number, then
Maybe it is fair enough to not check it at least by default? But what if I want to check it explicitly -- what can i do other than |
Hmm I think the workaround now is simple!
This is very nice ! |
The problem here is that it is difficult to know what file has been generated before. We used to have single file signatures (a bunch of .sig files for each output) so we could check if any input file has been touched. The problems are too many .sig files and performance so we removed this feature when we consolidate the signature stuff. |
But when I try to "abuse" it a bit,
it gives me the error:
which is completely understandable. In some cases when
then I guess we have to resort to
Sure, we should not check if it is not straightforward. I'm just trying to find a way to achieve it when there is a need. Boundary between |
We can certainly allow |
Well, maybe it is not clear from the MWE above, but in practice it is more like https://github.com/zouyuxin/GTEx/blob/master/pipeline/mashr_flashr_workflow.ipynb please jump to the bottom of it. You see in Cell 11,
because not sure how else I mix grouped input with something shared by groups. But if there is no grouping involved I'd really like to use Currently it works for Cell 11, only that it does not track the changes. I can change it to |
I meant, when you need
but it does not work because it only checks if
to actually refer to the step that produces the output. So perhaps we can implement
as the unnamed version of |
So by doing it, at least when we explicitly use |
We advertise So in the end we can provide makefile style data flow with
and process-style data flow (meaning all steps will be executed) with
|
Sounds good! I think also now our interpretation of process vs outcome oriented workflow is converging. |
OK, pending further testing, this is what I have done
This however, only happens during DAG building when depends can be resolved successfully. That is to say, while
works,
will work in the makefile style. Basically, because |
Updated https://vatlab.github.io/sos-docs/doc/user_guide/step_dependencies.html because this sounds like a conceptually important addition to the dependency building mechanism of SoS. I also added a table to summarize the syntaxes: https://vatlab.github.io/sos-docs/doc/user_guide/step_dependencies.html#Summary |
Great! Looks like the code change was drastic ... sorry for the troubles but I believe it is more intuitive and powerful than before. I think users will find it the same thanks to the summary table.
This is understandable limitation and at least I never use |
As usual, the limitations could be resolved, and in this case make things conceptually cleaner, but there will be a cost. I will create a new ticket for it. |
A lot of changes were introduced to handle the cases of dynamic depends. That is to say,
will trigger the rerun of
I sort of doubt if we have over-do this feature because neither gnumake nor snakemake is doing this, but we are potentially making SoS running slower with these changes. However, the code is supposed to only affect dynamic depends so should affect only a very small proportion of use cases, and the last patch completes our claimed feature on |
For the example below:
if I run the example above, then change
print(99)
toprint(999)
to introduce a change in script, I would expect thetest
step be re-executed. But it does not. What I think is reasonable behavior is that if SoS found that1.txt
is output of another step, then any changes in that step will trigger rerun of that step even if1.txt
already exists. If1.txt
is not output of any other steps then nothing changes.This is one reason I am a big fan of
sos_step
because for the workflow below:if I change
test
it will reran, which is very nice. Previously we use explicityprovides
but now we seem to try promoting the use of data flow, this issue is more problematic in this context.The text was updated successfully, but these errors were encountered: