Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for running Dockstore page URLs and TRS IDs #5102

Merged
merged 60 commits into from
Nov 13, 2024

Conversation

adamnovak
Copy link
Member

@adamnovak adamnovak commented Sep 24, 2024

This adds support for resolving Dockstore page URLs and TRS IDs that look Dockstore-y to actual workflow file URLs.

This enables running:

toil-cwl-runner "https://dockstore.org/workflows/github.com/dockstore-testing/md5sum-checker:master?tab=files" \
https://raw.githubusercontent.com/dockstore-testing/md5sum-checker/refs/heads/master/md5sum/md5sum-input-cwl.json
toil-wdl-runner --outputDialect miniwdl "#workflow/github.com/dockstore/bcc2020-training/HelloWorld" \
'{"hello_world.hello.myName": "https://raw.githubusercontent.com/dockstore/bcc2020-training/refs/heads/master/wdl-training/exercise1/name.txt"}'

Any top-level workflow URL/path starting with https://dockstore.org/workflows/ or #workflow/ will be replaced with the url that Dockstore provides for the workflow's descriptor. Usually this goes to Github.

There's some smart default version selection for main, master, or the only available version, and :-delimited versions are supported in the Dockstore URL or at the end of the TRS ID.

This only supports Dockstore; no other TRS endpoint can be queried. This is also not general support for fetching workflow code over TRS; I couldn't find a good way to get the URL that Dockstore actually hotlinks the TRS IDs on its web pages to, which includes the path within the repo to the workflow file, which would be required for relative imports to work, so I'm only using TRS to give me a URL to then redirect to. This also does not allow workflows to reference each other by TRS ID or Dockstore URL; it only applies to the top-level command line input.

This fixes #5049. This also fixes #5050 as long as Dockstore is the only TRS registry we care about.

Changelog Entry

To be copied to the draft changelog by merger:

  • Toil now supports running CWL and WDL workflows from Dockstore, by using either a Dockstore page URL or TRS ID as the URL/filename of the workflow to run. Since these often contain ? or #, remember to quote them on the command line!
  • Mesos build updated
  • CWL and WDL argument parsing revised for Python 3.12.

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

@adamnovak adamnovak linked an issue Sep 24, 2024 that may be closed by this pull request
@adamnovak
Copy link
Member Author

One idea floated by the Dockstore team was to use the Dockstore ZIP download URLs like:

https://dockstore.org/api/ga4gh/trs/v2/tools/%23workflow%2Fgithub.com%2Fbroadinstitute%2Fwarp%2FOptimus/versions/develop/WDL/files?format=zip

Instead of just going to the URL in the backing storage. This would let Dockstore actually mirror/host the workflow in case something happens to the backing storage.

@mr-c
Copy link
Contributor

mr-c commented Sep 26, 2024

TRS (and dockstore specifically) is already supported in cwltool (in fact, dockstore is the default registry)

https://cwltool.readthedocs.io/en/latest/#use-with-ga4gh-tool-registry-api

@mr-c
Copy link
Contributor

mr-c commented Sep 26, 2024

The cwltool code for TRS/dockstore is at https://github.com/common-workflow-language/cwltool/blob/55ccde7c2fe3e7899136ce8606a341e292d7050a/cwltool/resolver.py#L51 ; I'm happy to split that to a common library

@adamnovak
Copy link
Member Author

@mr-c The cwltool support for TRS came up when we were talking to the Dockstore team. It probably makes sense to have the CWL side just delegate to that.

Can the cwltool implementation handle upwards relative path references (or does CWL ban them)? On the WDL side we have a lot of import ../tools/tool.wdl in the workflows, and in its files TRS API Dockstore presents the primary descriptor as basename.ext and all the other files in the repo at higher directory levels as ../../dirname/otherfile.ext. So if you start at https://dockstore.org/api/ga4gh/v2/tools/TOOL_ID/versions/VERSION_ID/plain-CWL/descriptor/main_file.cwl and that file asks for ../otherdir/other_file.cwl, you can't apply the .. to the URL like you would if this was ordinary hypertext. That would have you fetch https://dockstore.org/api/ga4gh/v2/tools/TOOL_ID/versions/VERSION_ID/plain-CWL/otherdir/other_file.cwl, but that's not a well-formed TRS API request. The API wants to serve it at https://dockstore.org/api/ga4gh/v2/tools/TOOL_ID/versions/VERSION_ID/plain-CWL/descriptor/..%2Fotherdir%2Fother_file.cwl

@adamnovak
Copy link
Member Author

It would be good to split the TRS client stuff off into its own library, since then Toil could import it without needing the cwl extra when trying to use it on WDL.

The Dockstore team wanted to try extending TRS a bit to make it possible to get the path to the root workflow file within the repo that Dockstore is serving. You can already ask for descriptor//path/to/dir/main_file.ext (at least on Dockstore), and with a URL like that any references to .. work just like they would if it was a plain directory tree and not TRS. But to generate that URL you need the absolute path to the primary descriptor file and not just a basename.

But maybe once that's done it will be easy to have a tool -> well behaved root descriptor URL mapping function that's useful as a library.

@mr-c
Copy link
Contributor

mr-c commented Sep 27, 2024

Can the cwltool implementation handle upwards relative path references (or does CWL ban them)?

That's a good question, I haven't tested that in a TRS context.

@adamnovak
Copy link
Member Author

I've adjusted this to actually pull the workflow from Dockstore via ZIP download, instead of just asking Dockstore for a URL for the main file.

To avoid dealing with temporary directory cleanup, I'm caching all the extracted ZIPs in the XDG cache directory, identified by a hash of the names and hashes of the files that Dockstore reports. (This doesn't account for a workflow changing between Dockstore reporting the files and Dockstore providing the ZIP.) I then hunt through the extracted ZIP for exactly one file with the right basename and hash to be the primary descriptor. When Dockstore grows a way to report where the primary descriptor is supposed to be inside the ZIP, I can use that instead.

This should support any embedded binary files (which are only really possible in CWL since WDL File references in a workflow are working-directory-relative), as well as upwards imports.

@adamnovak
Copy link
Member Author

I had to fiddle with the CWL option parsing, since Python 3.12 changed how argparse deals with positional arguments vs. arguments that might belong to workflow options.

@adamnovak
Copy link
Member Author

I've now run make test tests='src/toil/test/src/jobServiceTest.py' so if it fails again on that hats will be eaten.

@adamnovak adamnovak merged commit 303a8d8 into master Nov 13, 2024
3 checks passed
@adamnovak adamnovak deleted the issues/5049-run-from-dockstore branch November 13, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TRS Support Support loading a dockstore URL as a workflow that Toil can run
3 participants