-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add caching support for CWL #5187
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @stxue1 !
I tested this PR using the CWL conformance tests; without --cachedir
they still pass.
Alas, if I add --cachedir
then 134 of the conformance tests fail (though 242 test DO pass!)
Thanks, I tested it on my machine and I think I tracked down the bug. I think it has to do with how we move around files from cwltool to Toil. We do some relocation of outputs from when we call cwltool internally which is messing some things up. It seems like setting a cache directory will set the current working directory of the file to be written to. When we execute the Toil job's side, I think we set the destination directory as the jobstore, resulting in the outputs being relocated eventually without copying/symlinking. Since caching depends on cwltool behavior, it is probably best to either copy the files into the jobstore (or figure out a way for the jobstore to be cache aware?). I have yet to find an entrypoint to control the |
We can add one on the cwltool side, if needed. |
I believe I was able to find an entrypoint |
As of 68f88e0, 128 tests fail. |
@mr-c Was this ran with a clean cache directory? If the cache directory was populated with the previous broken version of toil-cwl-runner then the runner will look up files from previous cached runs, and they won't exist. I ran some of the tests on my machine and they seem okay so far. Also, how many tests in parallel were run? It might be possible it could be a synchronization issue. |
Clean directory, |
FYI: Here is my invocation run from the root of a checkout of https://github.com/common-workflow-language/cwl-v1.2
Running not in parallel, I still get 127 failures. with |
Oddly enough, running the same command on my machine (but without
I'll run it again with podman and see if anything changes. |
Resolves #4298
Adds support for the cwltool equivalent
--cachedir
. This should make toil-cwl-runner be cache aware and use previous steps when possible and properly restart when there are new. This is different than the--restart
flag. Jobs previously ran with--cachedir
can rerun with--cachedir
and not with--restart
.--restart
should be used to run failed jobs that should succeed. If the CWL needs editing, then caching should be used, although this will take significantly more storage space compared to the default behavior. Ideally, this should only be used for development purposes.Changelog Entry
To be copied to the draft changelog by merger:
toil-cwl-runner
. Use--cachedir [dir]
to enable and rerun previously cached jobs.Reviewer Checklist
issues/XXXX-fix-the-thing
in the Toil repo, or from an external repo.camelCase
that want to be insnake_case
.docs/running/{cliOptions,cwl,wdl}.rst
Merger Checklist