-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update docs to hide Mesos #4413
Conversation
It looks like our readthedocs build is failing after #4361: https://readthedocs.org/projects/toil/builds/19574176/. I think we need to update the build steps to include the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's here looks fine, but there are several more places in the docs we want to touch if we want a more cohesive use of Kubernetes. For example:
- There's a bunch of examples using Mesos in
toil/docs/appendices/deploy.rst
Line 73 in 1f0930b
$ python main.py --batchSystem=mesos … - The autoscaling documentation and examples in
toil/docs/running/cloud/amazon.rst
Line 109 in 36b54c4
**TOIL_APPLIANCE_SELF=quay.io/ucsc_cgl/toil:latest** --- This is optional. It specifies a mesos docker image that we maintain with the latest version of toil installed on it. If you want to use a different version of toil, please specify the image tag you need from https://quay.io/repository/ucsc_cgl/toil?tag=latest&tab=tags. - The getting started examples in
toil/docs/gettingStarted/quickStart.rst
Line 35 in 1f0930b
Toil supports many different batch systems such as `Apache Mesos`_ and Grid Engine; its versatility makes it
I think we want to get those, and in general all the places in the docs where we use Mesos when there's not a good reason to pick it over Kubernetes.
src/toil/utils/toilLaunchCluster.py
Outdated
if options.clusterType == "mesos": | ||
logger.warning('You are using a "mesos" cluster, which is no longer recommended as Toil is ' | ||
'transitioning to using a kubernetes-based cluster. Consider switching to ' | ||
'--clusterType=kubernetes.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this one go first? Otherwise without the option users get both warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally intended to show both warnings if the option isn't set, but it might be redundant now that I look at it.
@@ -382,7 +396,7 @@ For example, to launch a Toil cluster with a Kubernetes scheduler, run: :: | |||
--provisioner=aws \ | |||
--clusterType kubernetes \ | |||
--zone us-west-2a \ | |||
--keyPairName wlgao@ucsc.edu \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to replace this in the last PR.
docs/running/cloud/amazon.rst
Outdated
@@ -276,6 +289,7 @@ Autoscaling leverages Mesos containers to provide an execution environment for t | |||
#. Launch the leader node in AWS using the :ref:`launchCluster` command: :: | |||
|
|||
(venv) $ toil launch-cluster <cluster-name> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the kubernetes batch system not support autoscaling? I tried this with --clusterType=kubernetes
, but when I run:
python sort.py aws:us-west-2:<job-store-name> \
--provisioner aws \
--batchSystem kubernetes \
--nodeTypes t2.medium \
--maxNodes 2
I get:
[2023-03-24T23:04:25+0000] [scaler ] [E] [toil.provisioners.clusterScaler] Exception encountered in scaler thread. Making a best-effort attempt to keep going, but things may go wrong from now on.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/toil/provisioners/clusterScaler.py", line 1149, in tryRun
self.scaler.updateClusterSize(estimatedNodeCounts)
File "/usr/local/lib/python3.10/dist-packages/toil/provisioners/clusterScaler.py", line 754, in updateClusterSize
newNodeCount = self.setNodeCount(instance_type, estimatedNodeCount, preemptible=nodeShape.preemptible)
File "/usr/local/lib/python3.10/dist-packages/toil/provisioners/clusterScaler.py", line 802, in setNodeCount
raise RuntimeError('Non-scalable batch system abusing a scalable-only function.')
RuntimeError: Non-scalable batch system abusing a scalable-only function.
It does look like the kubernetes batch system doesn't implement AbstractScalableBatchSystem
.
Are there other ways to dynamically spin up nodes that I am not aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like our Kubernetes implementation supports scaling up and down through Kubernetes's cluster autoscaler (which you get when you pass a range of numbers instead of a single number for the number of nodes you want of a given type, when launching the cluster).
Running the Toil-integrated autoscaler as part of the workflow needs the AbstractScalableBatchSystem
functions to e.g. drain nodes to safely scale them away. The Kubernetes cluster autoscaler is supposed to take care of all of that inside of Kubernetes and not inside the individual workflows.
@DailyDreaming says I can just merge this without another review. |
We can't use astroid 3 until sphinx-autoapi releases a fix for readthedocs/sphinx-autoapi#392
* Update docs to hide Mesos (#4413) * Update docs to hide Mesos * address review comments * remove invisible characters? * replace mesos in more places * Document Kubernetes-managed autoscaling, with in-workflow Mesos autoscaling as deprected * Reword some documentation and messages * Chase out more Mesoses * Don't insist on processes actually running promptly in parallel * Ask for a compatible set of Sphinx packages * Keep back astroid We can't use astroid 3 until sphinx-autoapi releases a fix for https://github.com/readthedocs/sphinx-autoapi/issues/392 --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Avoid concurrent modification in cluster scaler tests (#4600) This will fix #4599 by making the mock leader thread safe. * Add String to File functionality into toil-wdl-runner (#4589) * monkeypatch coerce for workflow related nodes * Fix task inputs string coerce * Disable kubernetes * Comment out cwl kubernetes * Maybe markers are wrong and comment out cactus-on-kubernetes * Add docstrings to changed functions + change input list to dict * Deal with nonetype --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Separate out integration tests to run on a schedule (#4612) * Reorganize tests and move integration tests to scheduled pipeline runs * Also handle tags * Add config file support (#4569) * Centralize defaults * Add requirements * Grab logLevel grabbed logLevel used to be the default in Config(), so grab effective logLevel that is set * Satisfy mypy mypy might still complain about missing stubs for configargparser though * Fix wrong default * add config tool * temp fix config sets defaults but so does argparse, runs twice in workflows but deals with tests * Fix create_config for tests instead * Fix setting of config defaults * Go back to previous method, create defaults at init * Fix default cli options set * Centralize, config util, and read properly * Fix type hinting to support 3.9 * mypy * Fix cwl edge case * Fix tests * fix typos, always generate config, fix some tests * Remove subprocess as maybe tests are flaky on CI with it? * just run quick_test_offline * make CI print stuff * Harden default config creation against races * Cleanup and argument renaming * Fix bad yaml and toil status bug * Fix mypy * Change behavior of --stats and --clean * Change test behavior as options namespace and config now have the same behavior * Put forgotten line ouch * Batchsystem, requirements, fixes for tests * Mypy conformance * Mypy conformance * Fix retryCount argument and kubernetesPodTimeout type * Only run batchsystem and slurm_test tests on CI * Whoops, this implementation never worked * Add pyyaml to requirements for slurm to pass * Add rest of gitlab CI back and run all tests * Update stub file to be compatible with updated mypy * Fix environment CLI option * Update provisioner test to use configargparse * Code cleanup and add jobstore_as_flag to DefaultArgumentParser etc * Fix toil config test * Add suggestions * Deprecate options, add underscore CLI options only for newly deprecated options * Update docs/argparse help and fix bug with deprecated options also make most generic arg as default for runLocalJobsOnWorkers * Add config file section to docs * Remove upper bound for ruamel requirements * Remove redundancies and improve disableCaching's destination name * Update src/toil/batchSystems/kubernetes.py Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Remove redundant line in status util * Remove comments in configargparse stub * Workaround to get action=append instead of nargs and get proper backwards compatibility Fix wrong name for link_imports and move_exports, remove new unused functions * Import SYS_MAX_SIZE from common rather than duplicating it * Mypy and syntax errors * Move config options back to the old naming syntax * Change names for link_imports and move_exports to camelCase options * Fix formatting * Bring back old --restart and --clean functionality where they collide and raise an error * Make debug less spammy and remove unused types * Disable kubernetes temporarily * Revert changes to --restart and --clean collision * Typo in tests * Change some comments and add member fields to config * Fix pickling error when jobstate file doesnt exist and fix threading error when lock file exists then disappears (#4575) Co-authored-by: Brandon Walker <walkerbd@dali1.dali.hpc.ncats.nih.gov> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Reduce the number of assert statements (#4590) * Change all asserts to raising errors for central toil files Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Fix mypy and update docs to match options in common * Update src/toil/common.py Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> Co-authored-by: Brandon Walker <43654521+misterbrandonwalker@users.noreply.github.com> Co-authored-by: Brandon Walker <walkerbd@dali1.dali.hpc.ncats.nih.gov> * take any nvidia-smi exception as not having gpu (#4611) Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Make WDLOutputJob collect all task outputs (#4602) Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Ensure sibling files in toil-wdl-runner (#4610) * Ensure sibling files stay sibling files when downloaded * Fix incorrect argument order * Fix directory collisions with sibling files * Make sure the `--batchLogsDir` exists if it is set (#4635) * Make sure the batch logs dir exists if it is set * Test Slurm with nonexistent --batchLogsDir * Upgrade cwltool to avoid broken galaxy-tool-util release. (#4639) Fixes: https://github.com/DataBiosphere/toil/issues/4638 * cwl: use the latest commit from the proposed CWL v1.2.1 branch (#4565) * Report errors in WDL using MiniWDL's error location printer (#4637) * Report errors in WDL using MiniWDL's error location printer * Decorate actual tasks with fancy WDL error reporting * Slap WDL error reporting on main * Remove banned ignore comment * Support Python3.11 and drop Python 3.7 (#4646) * Remove python 3.7 and add python 3.11 and make python3.11 the main python package * Move main python package back to 3.9 * Incude python3.11 in docker * Test 3.11 in CI * Add python3.11 to CI dockerfile * Add 3.11 to setup.py and debugging statements * Python 3.7 backwards compatibility * Update to py 3.12 and run 3.12 on gitlab CI * Comment out fstring and try importlib * Debug lint * Ensure mypy is using python3.12 * Print python version beofre mypy * Fix virtualenv, pip for python3.12 * Get rid of mesos tests/builds * 3.12 * Revert debug change * Go back to 3.11 and update docker package to make requests work again * use an available htcondor package closest to 3.10 version * update htcondor for all * get pip for all python versions * get virtualenv for all python versions * needs specific ordering * Separate mesos tests * remove 3.7 from CI image * Remove debug statement from makefile * Fix configargparse in CWL (#4618) * Parse config file separately from rest of args * Mypy * update configargparse stub * Dont try to eat cwl arguments * Use simpler workaround * Revert to just CWL * Change REMAINDER to "*", add help statements and test command line inputs * Remove extradockergroup name * Declare type * Add proper relative path to cwl file * Remove unnecessary test --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Update ruamel-yaml requirement from <0.17.33,>=0.15 to >=0.15,<0.18.4 (#4659) Updates the requirements on [ruamel-yaml]() to permit the latest version. --- updated-dependencies: - dependency-name: ruamel-yaml dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix CI Appliance Builds (#4655) * Properly build 3.11, fix dependencies and move aws stubs/mock into dev * only keep htcondor installs in appliance builds * Remove unused import * Fix extras_require syntax * Fix #3867 and try to explain but not crash when bad things happen to our mutex file (#4656) * Bump mypy from 1.5.1 to 1.6.1 (#4660) * Bump mypy from 1.5.1 to 1.6.1 Bumps [mypy](https://github.com/python/mypy) from 1.5.1 to 1.6.1. - [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md) - [Commits](https://github.com/python/mypy/compare/v1.5.1...v1.6.1) --- updated-dependencies: - dependency-name: mypy dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * type fix --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael R. Crusoe <michael.crusoe@gmail.com> * Move around reqs and move aws dev libraries to aws (#4664) * Turn batch system tests back on (#4649) This should fix #4648 by turning on the batch system tests again. The Mesos-specific ones are already moved elsewhere. Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * Bump miniwdl from 1.10.0 to 1.11.1 (#4669) Bumps [miniwdl](https://github.com/chanzuckerberg/miniwdl) from 1.10.0 to 1.11.1. - [Release notes](https://github.com/chanzuckerberg/miniwdl/releases) - [Commits](https://github.com/chanzuckerberg/miniwdl/compare/v1.10.0...v1.11.1) --- updated-dependencies: - dependency-name: miniwdl dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Move TES batch system to a plugin (#4650) * Implement new batch system finding API and plugin scan * Satisfy MyPy * Implement deprecation for the old constants * Get plugin loader to actually load, and drop TES * Remove TES Kubernetes setup we don't use * Stop asking for needs_tes --------- Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * skip unwanted networkx version (#4450) * skip unwanted networkx version * Limit to released major versions of networkx --------- Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * CWL Pipefish compatibility (#4636) * Add a bunch of value resolving logging * Quiet debugging a bit * Move default setting for workflows so it works on subworkflows * Remember to keep making a ToilFsAccess on the leader * Satisfy MyPy * Stop giving CWL containers directories full of broken symlinks * Update test to expect no symlinks * Move CWL integration tests for bioconda/biocontainers to integration test runs * Wrap mkdtemp to fix #4644 * Sort imports in example scripts * Use absolute-ized paths for work and coordination directories * Bump cwltool from 3.1.20231020140205 to 3.1.20231114134824 (#4685) Bumps [cwltool](https://github.com/common-workflow-language/cwltool) from 3.1.20231020140205 to 3.1.20231114134824. - [Release notes](https://github.com/common-workflow-language/cwltool/releases) - [Commits](https://github.com/common-workflow-language/cwltool/compare/3.1.20231020140205...3.1.20231114134824) --- updated-dependencies: - dependency-name: cwltool dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump mypy from 1.6.1 to 1.7.0 (#4684) * Bump mypy from 1.6.1 to 1.7.0 Bumps [mypy](https://github.com/python/mypy) from 1.6.1 to 1.7.0. - [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md) - [Commits](https://github.com/python/mypy/compare/v1.6.1...v1.7.0) --- updated-dependencies: - dependency-name: mypy dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * mypy 1.7.0 type updates * format modified files * remove unused imports --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael R. Crusoe <michael.crusoe@gmail.com> * Remove the parasol batch system. (#4678) Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Reenable Cactus on Kubernetes CI test (#4604) * Reenable kubernetes tests that don't require a local cluster, eg CWL on ARM and Cactus integration on kubernetes * Disable CWL kubernetes * enable cactus tests * Add to scheduled integration tests * Add forgotten file * Remove print statements * Remove unnecessary env var and move file * Run test when updated Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * update gitlab * Fix typo in path * Add virtualenv and prepare build to gitlab CI to run tests properly * add gitlab setup scripts * add gitlab setup scripts --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Only count output file usage when using the file store (#4692) * Bump mypy from 1.7.0 to 1.7.1 (#4697) Bumps [mypy](https://github.com/python/mypy) from 1.7.0 to 1.7.1. - [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md) - [Commits](https://github.com/python/mypy/compare/v1.7.0...v1.7.1) --- updated-dependencies: - dependency-name: mypy dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * AWS jobStoreTest: re-use delete_s3_bucket from toil.lib.aws (#4700) Ignore errors when cleaning up the FileJobStoreTest * Make sure cwltool always knows we have an outdir to fix #4698 (#4699) * remove useage of the deprecated pkg_resources (#4701) setup.py: make clear that Python 3.7 is no longer supported Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * more resiliancy (#4395) * Support CWL 1.2.1 (#4682) * cwl: use the latest commit from the proposed CWL v1.2.1 branch * Double default CWL conformance test timeout * Support abs path for directory outputs * Better comment for why local paths are permitted * add relax-path-checks to CI tests --------- Co-authored-by: Michael R. Crusoe <michael.crusoe@gmail.com> Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com> * Remove the WDL compiler. (#4679) * Remove the WDL compiler. * Linting. * Update WDL stand-alone. * Weird linting error? * Cut compiler docs * Stop trying to run removed WDL compiler tests --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Allow working with remote files in CWL and WDL workflows (#4690) * Start implementing real ToilFsAccess URL operations * Implement URL opening for CWL * Implement other ToilFsAccess operations without local copies * Remove getSize spelling and pass mypy * Add missing import * Remove check for extremely old setuptools * Add --reference-inputs option to toil-cwl-runner * Allow files to be gotten by URI on the nodes * Add some tests to exercise URL references * Implement URI access and import logic in WDL interpreter * Remove duplicated test * Fixc some merge problems * Satisfy MyPy * Spell default correctly * Actually hook up import bypass flag * Actually pass self test when using URLs * Make file job store volunteer for non-schemed URIs * Revert "Make file job store volunteer for non-schemed URIs" This reverts commit 3d1e8f6761bd29f5bfedfd055f025943ab6ed1b8. * Handle size requests for bare filenames * Handle polling for URL existence * Add a make test_debug target for getting test logs * Add more logging to CWL streaming tests * Contemplate multi-threaded access to the CachingFileStore from user code * Allow downloading URLs in structures, and poll AWS directory existence right * Update tests to a Debian with ARM Docker images * Undo permission changes * Add missing import --------- Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com> * upgrade to cwltool 3.1.20231207110929 (#4707) Co-authored-by: Michael R. Crusoe <michael.crusoe@gmail.com> * Update docker requirement from <7,>=3.7.2 to >=3.7.2,<8 (#4713) Updates the requirements on [docker](https://github.com/docker/docker-py) to permit the latest version. - [Release notes](https://github.com/docker/docker-py/releases) - [Commits](https://github.com/docker/docker-py/compare/3.7.2...7.0.0) --- updated-dependencies: - dependency-name: docker dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Implement a better config file system for CWL/WDL options (#4666) * Strip leading whitespace from WDL commands (#4720) * Strip leading whitespace from WDL commands * Work around MiniWDL's wrong type * Add __init__.py to options folder (#4723) * Make cwl mutually exclusive groups exist only when cwl is not suppressed (#4725) * Point CI at the new public URLs for stuff we host * Bump mypy from 1.7.1 to 1.8.0 (#4731) Bumps [mypy](https://github.com/python/mypy) from 1.7.1 to 1.8.0. - [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md) - [Commits](https://github.com/python/mypy/compare/v1.7.1...v1.8.0) --- updated-dependencies: - dependency-name: mypy dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Tolerate a failed AMI polling attempt (#4727) * Tolerate a failed AMI polling attempt * Start marking Internet-relates tests to keep them out of the offline step * Update flake8 requirement from <7,>=3.8.4 to >=3.8.4,<8 (#4738) Updates the requirements on [flake8](https://github.com/pycqa/flake8) to permit the latest version. - [Commits](https://github.com/pycqa/flake8/compare/3.8.4...7.0.0) --- updated-dependencies: - dependency-name: flake8 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix --printJobInfo (#4709) * Add a test for --printJobInfo * Move file name listing into the FileJobStore so it can sort of work again * Fix Toil subcommand usage to include the subcommand * Satisfy MyPy * Fix =True syntax and find files even when their jobs are gone or they are no-job * Add a test for actually rerunning a job * Make the test for running a job alone pass * Address review comments --------- Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * remove extraneous dependency on old 'mock' (#4739) 'mock' has been integrated in the standard library as 'unittest.mock' * Improve WDL documentation (#4732) * Fix code block boundary * Make the CWL quickstart the main one * Talk about Python workflows instead of user scripts * Chase away all the Sphinx warnings so we know the docs should look right * Fail the docs build if the docstrings don't parse cleanly * Encourage installing with cwl and wdl extras * Qualify Python development * Reorganize docs to plug the workflow languages more * Talk a bit about WDL * Add conformance test and install info * Stop trying to draw inheritance diagrams since RtD doesn't give us a dot anyway --------- Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * Fix scheduled CI tests (#4742) * Actually filter to Mesos tests Also run Mesos tests of we touch Mesos. It looks like https://github.com/DataBiosphere/toil/pull/4646 added a bunch of Mesos test run steps but didn't include tests= so they just run all tests, even if the dependencies aren't there. * Don't import boto when it may not be installed * Stop pinning very old setuptools and pyyaml This basically reverts 60096d89eb7233b2791000da87a9754399fcb9c4 and should let us use a setuptools that is new enough for the Python versions we are using. * Run all tests on -fix-ci branches * Put Mesos AWS tests in the Mesos step * Improve WDL documentation (#4732) * Fix code block boundary * Make the CWL quickstart the main one * Talk about Python workflows instead of user scripts * Chase away all the Sphinx warnings so we know the docs should look right * Fail the docs build if the docstrings don't parse cleanly * Encourage installing with cwl and wdl extras * Qualify Python development * Reorganize docs to plug the workflow languages more * Talk a bit about WDL * Add conformance test and install info * Stop trying to draw inheritance diagrams since RtD doesn't give us a dot anyway --------- Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * Indent docstring to fix doc build failure --------- Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * Update EC2 instances and EC2 update script. (#4745) * Update EC2 instances and EC2 update script. * Minor details. * Clean up. * Linting. * Ignore a perfectly good import. --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Log more usefully for CWL workflows (#4736) * Log files going in and out and the various CWL workflow phases * Log CWL job executions to the leader just as text; replace logToMaster * Log runtime context name * Revise other logging messages to improve CWL logs * Fix test to allow trailing newline --------- Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * Don't mark inputs (or outputs) executable for no reason (#4728) * Be explicit about executable representation * Add testing to make sure outputs aren't unexpecteldy executable * Let js expressions in the scatters take a long time to start Node --------- Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * Bump cwltool from 3.1.20231207110929 to 3.1.20240112164112 (#4751) Bumps [cwltool](https://github.com/common-workflow-language/cwltool) from 3.1.20231207110929 to 3.1.20240112164112. - [Release notes](https://github.com/common-workflow-language/cwltool/releases) - [Commits](https://github.com/common-workflow-language/cwltool/compare/3.1.20231207110929...3.1.20240112164112) --- updated-dependencies: - dependency-name: cwltool dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update flake8-bugbear requirement from <24,>=20.11.1 to >=20.11.1,<25 (#4752) Updates the requirements on [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear) to permit the latest version. - [Release notes](https://github.com/PyCQA/flake8-bugbear/releases) - [Commits](https://github.com/PyCQA/flake8-bugbear/compare/20.11.1...24.1.15) --- updated-dependencies: - dependency-name: flake8-bugbear dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * add pure Python fallback for getDirSizeRecursively() (#4753) * add pure Python fallback for getDirSizeRecursively() * Fix spelling --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Update version_template.py for release * Store chaining information just once (#4737) * Keep around old names of chained jobs * Get rid of chainedJobs * Just pull log names from jobDesc * Use an accessor to just get the whole chain together * Imporve comment and formatting * Fix wrong name in import * Stop marking HTTP registry as insecure (#4757) This should fix #4756 and hopefully the intermittent test failures where buildkit tries to speak HTTPS to our Docker cache. * CWL: don't clear out user-provided values for the --default-container (#4730) * CWL: don't clear out user-provided values for the --default-container Fixes https://stackoverflow.com/questions/77684785/toil-cwl-runner-not-using-default-container-option-with-singularity-option * mypy --strict for the CWL tests * soften cap on ruamel.yaml dependency * remove ruamel.yaml.string dependency for a simpler solution (#4760) * Try to mitigate filling up the coordination directory (#4749) * Complain more usefully about a bad coordination directory * Don't pick tiny filesystems for coordination, and organize everything in toilwf- directories * Put cleanup arena so it shares a prefix with but isn't in the directory it protects * Fix variable name * Don't catch any old thing, which doesn't work anymore anyway * Allow toil-wdl-runner to run on Kubernetes and Mesos (#4754) * Change docker security rules, remove --containall on singularity, add tzdata as dependency * remove link for tzdata and add integration test * Add test to gitlab and remove provisioner option --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Ship User Logs to Leader (#4755) * Document the stats and logging design as it stands * Plug WDL task stdout and stderr into the --writeLogs system as new user streams * Log CWL and WDL output and error logs that aren't captured by the workflow itself * Name CWL and WDL log files usefully This goes back to using displayName for stats and logging. It also adds a WDL "task path" which is like the namespace but includes numbers for scatters, and uses that to name the log files. * Log more to illustrate https://github.com/moby/buildkit/issues/4458 * Document the user log system architecture * Satisfy mypy * Go back to using displayName for stats again * Clarify CWL output handling * Revise test to allow new '_' * Update pytest requirement from <8,>=6.2.1 to >=6.2.1,<9 (#4772) Updates the requirements on [pytest](https://github.com/pytest-dev/pytest) to permit the latest version. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pytest-dev/pytest/compare/6.2.1...8.0.0) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add workflow to automatically update PRs when other PRs merge (#4774) * Stop complaining about XDG_RUNTIME_DIR (#4769) * Update setuptools requirement from <69,>=65.5.1 to >=65.5.1,<70 (#4693) Updates the requirements on [setuptools](https://github.com/pypa/setuptools) to permit the latest version. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) - [Commits](https://github.com/pypa/setuptools/compare/v65.5.1...v69.0.0) --- updated-dependencies: - dependency-name: setuptools dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Stop failing auto-update workflow on every merge conflict * read the docs: enable generating graphs like inheritance trees. (#4734) * read the docs: enable generating graphs like inheritance trees. * Add Graphviz to CI image --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Docs: Always show Python execution using `python3` (#4764) In case a virtualenv is not used Co-authored-by: Andreas Tille <tille@debian.org> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Make formatting do all the code (#4777) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * avoid unnecessary boto{,3} imports (#4763) Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * remove use of distutils by copying in strtobool() (#4765) * remove use of distutils by copying in strtobool() Copied code is MIT licensed https://github.com/pypa/distutils/blob/fb5c5704962cd3f40c69955437da9a88f4b28567/distutils/util.py#L340 https://github.com/pypa/distutils/blob/fb5c5704962cd3f40c69955437da9a88f4b28567/LICENSE * Add type hints and replace distutils code with our own --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Revert --disableProgress to old flag-style behavior (#4778) * Change default Singularity cache paths to be global (#4762) * Change default cache paths to piggyback off of singularity and miniwdl defaults + set cache paths on cloud to /var/lib/toil * Improve documentation * Revert block quote and bold instead * Change singularity cache directory to the right default directory --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * CPU count fallback (#4780) * Fall back to 1 core when # CPUs unavailable * Apply all limits and then fall back to 1 --------- Co-authored-by: Theodore Ni <3806110+tjni@users.noreply.github.com> * Fix special characters in filenames with the FileJobStore (#4781) * Remove extraneous unquote * Log task standard error to the worker log if it fails and MiniWDL hasn't already logged it * Hack around having to dedent the command at the wrong time by keying on the first line * Remove extra logging and cross-checks * Add back missing line end * Work around boto stubs regression in https://github.com/python/typeshed/issues/11381 --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update sphinx-autodoc-typehints requirement (#4784) Updates the requirements on [sphinx-autodoc-typehints](https://github.com/tox-dev/sphinx-autodoc-typehints) to permit the latest version. - [Release notes](https://github.com/tox-dev/sphinx-autodoc-typehints/releases) - [Changelog](https://github.com/tox-dev/sphinx-autodoc-typehints/blob/main/CHANGELOG.md) - [Commits](https://github.com/tox-dev/sphinx-autodoc-typehints/compare/1.24.0...2.0.0) --- updated-dependencies: - dependency-name: sphinx-autodoc-typehints dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Use a default log limit of 100MiB (#4788) * Use a default log limit of 100MiB * Update documented default * Require a new enough Docker to fix #4794 (#4795) * Log CWL command output inline on failure, and to logging system whether it succeeds or not (#4793) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Unify devirtualization to fix output name collisions (#4792) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Allow setting WDL container engine with --container (#4787) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * Request and handle Slurm timeout signal (#4804) * Add a Slurm termination signal for timeouts * Use SIGINT for Slurm timeouts instead of SIGTERM * Make the interrupt signal actually get to the worker process * Run worker orderly cleanup even if asked to stop * Preserve exit code from user code * Enforce failure when Slurm jobs time out (#4802) * Don't let 0 exit codes out of the Slurm batch system if the job isn't completed. * Add missing import * Teach Slurm and part of LSF to use the Toil exit reason system * Report unavailable exit status better * Make sure exit reasons come out as readable strings when logged on Python 3.11+ --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Fix caching being accidentally set to True instead of None (#4805) * Better stats for WDL workflows (#4770) * Split up WDL input evaluation and command execution * Rename task parts to inputs and command * Deduplicate across scatters for stats * Report CPU wait accurately with multiple cores, and improve titles Fixes #4768 * Fix memory units in stats and on Mac * Move job disk usage tracking and warning to AbstractFileStore * Save disk to stats * Fix imports and variable name * Remove duplicated stat printing code * Unify stat computation * Use the category metadata globals to drive everything and sync the width and print code * Stop coming up with negative wait when jobs don't report cores * Allow setting WDL container engine with --container * Use a default log limit of 100MiB * Update documented default * Require a new enough Docker to fix #4794 * Add a unit notion to stats * Be consistent about printing units in toil stats * Rename functions to snake_case * Improve error reporting and split cluster and normal utils * Start documenting the parts of the stats * Swap over to a stats example that is more illustrative * Fix counting the jobs per worker * Explain all the job columns and the sorting * Fix typing of jobs list * Fix documentation build * Fix white-box stats test * Move the cluster utils out of the cloud providers ToC section * Update worker.py --------- Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update EC2 instance list. (#4808) * Bump version. * Update README.rst Couple of small doc changes. * Respect job local-ness when chaining (#4809) * Add test to make sure local jobs don't chain to nonlocal ones * Implement chaining block for local to nonlocal * Scale down stats tutorial test to fit on small CI runners --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Fix Python 3.8 support (#4823) * Add all the supported Python versions as scheduled tests * Don't let the Docker build succeed when Toil can't run at all * Use 3.8-compatible type hints * Fix missing description on PyPI (#4820) * setuptools: Include README in the package metadata. Currently https://pypi.org/project/toil/#description is > The author of this package has not provided a project description * Makefile: use isolated builds, add dist target (sdist+wheel) and deprecate the sdist target. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Install build (#4826) * Use a sentinel location instead of an unmodified location to mark missing files (#4818) * Use a sentinel location instead of an unmodified location to mark missing files * Fix spelling --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Bump mypy from 1.8.0 to 1.9.0 (#4830) Bumps [mypy](https://github.com/python/mypy) from 1.8.0 to 1.9.0. - [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md) - [Commits](https://github.com/python/mypy/compare/v1.8.0...1.9.0) --- updated-dependencies: - dependency-name: mypy dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Make sure output directory exists before using it (#4832) * Pass through statusCode to prevent infinite loop (#4829) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Add tests for environment pickling (#4837) * Add a test for the environment coming from environment.pickle over top of anything on the leader * Make sure test works with slow job stores like AWS * Bump sphinxcontrib-autoprogram from 0.1.8 to 0.1.9 (#4838) Bumps [sphinxcontrib-autoprogram](https://github.com/sphinx-contrib/autoprogram) from 0.1.8 to 0.1.9. - [Release notes](https://github.com/sphinx-contrib/autoprogram/releases) - [Changelog](https://github.com/sphinx-contrib/autoprogram/blob/master/doc/changelog.rst) - [Commits](https://github.com/sphinx-contrib/autoprogram/compare/0.1.8...0.1.9) --- updated-dependencies: - dependency-name: sphinxcontrib-autoprogram dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add colored logging (#4828) * Add coloredlogs * type ignore * Fix test to get around how coloredlogs deals with handlers * Fix option, functionname, license, formatting, and colors * Remove excess datetime --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Remove unused CI test (#4843) * Measure CPU and memory usage in WDL Docker containers (#4819) * Inject code into the container like MiniWDL to get Docker CPU and memory usage * Remove not a real ref * Keep resource monitoring state in a class * Fix lingering old import * Get import name right --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * Allow debugging jobs by name (and status improvements) (#4840) * Report tag parsing errors better in case you mix up type and tag * Fix toil status per-job status report to be per-job * Shorten toil status option names * Report completely failed jobs * Rearrange per-job stats to make it easier to find runnable and failed jobs * Add printing failed jobs specifically * Stop making a config just to get status * Implement search for job by name in debug-job by cribbing from status * Document the toil status flags a bit * Write up some debug-job examples * Explain names more and drop distracting log line --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Improve exception handling to not output tracebacks (#4839) * Improve exception handling, don't output tracebacks when possible * Remove excess code in test * Fix test to use subprocess to accommodate for changed exception handling * Reword check_initialized() Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Move comments and make LocatorException take a prefix instead * Change config to options as it no longer exists --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Update pytest-cov requirement from <5,>=2.12.1 to >=2.12.1,<6 (#4851) Updates the requirements on [pytest-cov](https://github.com/pytest-dev/pytest-cov) to permit the latest version. - [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst) - [Commits](https://github.com/pytest-dev/pytest-cov/compare/v2.12.1...v5.0.0) --- updated-dependencies: - dependency-name: pytest-cov dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update docutils requirement from <0.21,>=0.16 to >=0.16,<0.22 (#4866) Updates the requirements on [docutils](https://docutils.sourceforge.io) to permit the latest version. --- updated-dependencies: - dependency-name: docutils dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update galaxy-util requirement from <23 to <25 (#4862) Updates the requirements on [galaxy-util](https://github.com/galaxyproject/galaxy) to permit the latest version. - [Release notes](https://github.com/galaxyproject/galaxy/releases) - [Commits](https://github.com/galaxyproject/galaxy/compare/galaxy-util-19.9.0...v24.0) --- updated-dependencies: - dependency-name: galaxy-util dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update galaxy-tool-util requirement from <23 to <25 (#4861) Updates the requirements on [galaxy-tool-util](https://github.com/galaxyproject/galaxy) to permit the latest version. - [Release notes](https://github.com/galaxyproject/galaxy/releases) - [Commits](https://github.com/galaxyproject/galaxy/compare/galaxy-tool-util-19.9.0...v24.0) --- updated-dependencies: - dependency-name: galaxy-tool-util dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com> * Bump cwltool from 3.1.20240112164112 to 3.1.20240404144621 (#4870) Bumps [cwltool](https://github.com/common-workflow-language/cwltool) from 3.1.20240112164112 to 3.1.20240404144621. - [Release notes](https://github.com/common-workflow-language/cwltool/releases) - [Commits](https://github.com/common-workflow-language/cwltool/compare/3.1.20240112164112...3.1.20240404144621) --- updated-dependencies: - dependency-name: cwltool dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump gunicorn from 21.2.0 to 22.0.0 (#4871) Bumps [gunicorn](https://github.com/benoitc/gunicorn) from 21.2.0 to 22.0.0. - [Release notes](https://github.com/benoitc/gunicorn/releases) - [Commits](https://github.com/benoitc/gunicorn/compare/21.2.0...22.0.0) --- updated-dependencies: - dependency-name: gunicorn dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Retry Slurm interactions more (#4869) * Hook up grid engine batch systems to the normal retry system and add --stastePollingTimeout * Remove extra word * Insist on understanding the Slurm states and stop if we don't * Change how we think of REVOKED and SPECIAL_EXIT * Add missing argument * Import missing exception type --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Replace use of boto with boto3 for `awsProvisioner.py` (#4859) * Take out boto2 from awsProvisioner.py * Add mypy stub file for s3 * Lazy import aws to avoid dependency if extra is not installed yet * Also lazy import in tests * Separate out wdl kubernetes test to avoid missing dependency * Add unittest main * Fix wdl CI to run separated tests * Fix typo in lookup * Update moto and remove leftover line in node.py * Apply suggestions from code review Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Apply fixes * Abstract AWS ErrorCondition server errors into a constant instance * Move AWSServiceErrors declaration to a better place * Prevent aliasing from confusing sphinx and remove cached autoapi in clean * Update src/toil/lib/aws/__init__.py Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Change retry loop * Replace assert with raise --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Allow fetching job inputs for debugging (#4848) * Reformat worker * Actually change kwarg name * Enable stopping WDL (and probably CWL) jobs after files are downloaded * Make sure WDL commands get logged before we stop * Add type hints * Add debug flag accessor * Make debug-job default to debug logging * Build fake container environments for CWL and WDL jobs when debugging them * Add an example of dumping job files to the docs * Add tests for the file retrieval and container faking * Add missing imports --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Make leader wait for expected updates to be visible in the job store, or fail the job (#4811) * Implement expecting version bumps and fail src/toil/test/batchSystems/batchSystemTest.py::MaxCoresSingleMachineBatchSystemTest::testServices * Actually turn on debug logging for service test * Refer to jobs for space usage accounting by stringified job description and not body file * Use exponential backoff when polling for job updates * Fix comparison direction * Plug the new CLI option * Include version writers in warnings * Make return type annotation correct * Don't wait for new versions of failed jobs because then we're too slow to pass the badWorker tests * Scale down stats tutorial test to fit on small CI runners * Work out that command overrides aren't being removed * Stop having an overloaded command field on JobDescriptions * Fix typos and update architecture to lean less on command * Fix calling the checkpoint restore * Handle None vs. empty successors in tests * Handle places that didn't expect nextSuccessors() to ever be None * Remove extra the * Fix handling jobs that had no bodies, and consolidate warning logic * Always actually do a reset even if no new version is ready. * Use has_body accessor more * Rename loadJob variables * Rename _body_spec and use more has_body() * Use a NamedTuple instead of a command-style string to point to the body * Improve JobDescription docstring and fix typoed argument name * Remove worker command from JobDescription * Eliminate references to get_worker_command/set_worker_command --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Enable FUSE for privileged Toil clusters (#4824) * Add option for privileged clusters and enable privileges for toil-managed clusters * Fix syntax error and add back namespace rules * packages might be broken * Dependencies * Move apt clean * Create test image * Create test image 2 * Try just creating the base docker image * test image creation, typo * Try focal debian package * Try the last docker build command * remove nontoil makefile dependencies to test * Successfully build docker images at least for amd64 * Remove unprivileged fuse mount code * Bring back rest of docker builds * Remove unnecessary env var in dockerfile * Fix setuptools and virtualenv to some version and revert whitespace * Apply suggestions from code review Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Move SINGULARITY_CACHEDIR comment * Formatting and move strtobool * Reflect moved functions for imports * Remove debug_mute flag and print debugging statement outside instead --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Detect if the GridEngine worker thread has crashed to prevent hanging the workflow (#4873) * Debug envvar * add error to message * Add logic for unexpected background thread failure * Set block back to true * Don't duplicate thread exception message and print at end * Revert "Debug envvar" This reverts commit 13392858db352da75c8ddfe3b4d13b5d88eccf14. * Apply suggestions from code review Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Bump mypy from 1.9.0 to 1.10.0 (#4878) Bumps [mypy](https://github.com/python/mypy) from 1.9.0 to 1.10.0. - [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md) - [Commits](https://github.com/python/mypy/compare/1.9.0...v1.10.0) --- updated-dependencies: - dependency-name: mypy dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * remove SLURM caching override to support caching (#4884) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Add more debug logging for when the job is attempted and the worker is started (#4881) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update WDL conformance tests on CI (#4876) * Update wdltoil_test.py * Fix typo * Fix version for integration tests --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Replace all usage of boto2 with boto3 (#4868) * Take out boto2 from awsProvisioner.py * Add mypy stub file for s3 * Lazy import aws to avoid dependency if extra is not installed yet * Also lazy import in tests * Separate out wdl kubernetes test to avoid missing dependency * Add unittest main * Fix wdl CI to run separated tests * Fix typo in lookup * Update moto and remove leftover line in node.py * Remove all instances of boto * Fix issues with boto return types and grab attributes before deleting * Remove some unnecessary abstraction * Fix improperly types in ec2.py * Ensure UUID is a string for boto3 * No more boto * Remove comments * Move attribute initialization * Properly delete all attributes of the item * Move out pager and use pager for select to get around output limits * Turn getter into method * Remove comment in setup.py * Remove commented dead import * Remove stray boto import * Apply suggestions from code review Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Rename, rearrange some code * Revert not passing Value's to attributes when deleting attributes in SDB * Fix missed changed var names * Change ordering of jobstorexists exception to fix improper output on exception --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Revert ensurepip to get-pip (#4900) * docs cleanup (#4889) * file incorrect file extensions. * fix typos --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Bump to a new major version (#4885) Since #4811 made the batch systems take the command as an argument, we now have to bump the major version to signal incompatibility with any old batch system plugins. Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Warn user. (#4893) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Allow symlinks to inputs as WDL outputs (#4883) * Detect missing files at the offending step and announce the problem conspicuously * Log the offending expression * Resolve symlinks against container mounts during file virtualization * Try and forward along original virtualized filenames --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * bye pytz (#4890) * pytz is not needed in Python 3.9+, or with the zoneinfo backport * make diff_mypy: quieter and target the correct branch * Linting. * Satisfy MyPy more (new MyPy?) --------- Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: DailyDreaming <lblauvel@ucsc.edu> * Stop suggesting infinity when validating half-open intervals (#4887) This should fix #4886 by not suggesting to the user that "infinity" is an option value that can be used. It also explains the option intervals in words instead of interval notation, which people might not be expecting. Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Fix WDL option spelling and tolerate Cromwell-isms (#4906) * Fix WDL option spelling and tolerate Cromwell-isms * Linting. * Satisfy MyPy more (new MyPy?) --------- Co-authored-by: DailyDreaming <lblauvel@ucsc.edu> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Remove wrapped CWL doc example. (#4892) * Remove wrapped CWL doc example. * Patch missing links. * Remove AWS dependant import/test from cwlTest.py. * Missing @slow. * Missing import. * Make SimpleDB retry on EndpointConnectionError * Linting. * Satisfy MyPy more (new MyPy?) --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Add retries to DockerCheckTest.testBadGoogleRepo (#4909) * Add retries to flaky test * get rid of extra import --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Fix 3.8 backport.timezone import (#4908) * Fix 3.8 import and remove dead comment in requirements.txt --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> * Update to Python 3.12 (#4901) * Add Python 3.12 to CI * Update sphinx-autoapi and astroid to deal with crash https://github.com/pylint-dev/pylint/issues/8782 * Remove dead comment * Add rules to 3.11 build * update htcondor * Update use of HTcondor in appliance build * Ensure tests are instanced and don't jumble relative paths + debug logging * oops, update utilsTest too * is this a pytest issue? * Add some more log messages * Fix time.sleep * Remove the debug statement in docker * Bump flask-cors from 4.0.0 to 4.0.1 (#4916) Bumps [flask-cors](https://github.com/corydolphin/flask-cors) from 4.0.0 to 4.0.1. - [Release notes](https://github.com/corydolphin/flask-cors/releases) - [Changelog](https://github.com/corydolphin/flask-cors/blob/main/CHANGELOG.md) - [Commits](https://github.com/corydolphin/flask-cors/compare/4.0.0...4.0.1) --- updated-dependencies: - dependency-name: flask-cors dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Try /tmp before the workdir (#4914) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * biocontainer tests: use version corresponding to v2 Docker Image Format (#4912) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Revert "Update to Python 3.12 (#4901)" (#4917) This reverts commit 460846d7ded3820acc505cccb9c866ea9a7a940a. * Bump miniwdl from 1.11.1 to 1.12.0 (#4920) Bumps [miniwdl](https://github.com/chanzuckerberg/miniwdl) from 1.11.1 to 1.12.0. - [Release notes](https://github.com/chanzuckerberg/miniwdl/releases) - [Commits](https://github.com/chanzuckerberg/miniwdl/compare/v1.11.1...v1.12.0) --- updated-dependencies: - dependency-name: miniwdl dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Support Python 3.12 (#4919) * Add Python 3.12 to CI * Update sphinx-autoapi and astroid to deal with crash https://github.com/pylint-dev/pylint/issues/8782 * Remove dead comment * Add rules to 3.11 build * update htcondor * Update use of HTcondor in appliance build * Ensure tests are instanced and don't jumble relative paths + debug logging * oops, update utilsTest too * is this a pytest issue? * Add some more log messages * Fix time.sleep * Remove the debug statement in docker * remove logger print statements in utilsTest.py and pin pytest * Up the timeout on some tests (possiby a timing issue) * Up the timeout on more tests * Up the pytest version again * Add documentation for installing batch system plugins (#4926) Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> * Update Werkzeug to appease the Github security police (#4925) It looks like if you give away your debugger PIN, people can use your Werkzeug debugger. This is somehow a security issue and was apparently never fixed on Werkzeug 2. Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Remove unused comment --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: William Gao <wlgao@ucsc.edu> Co-authored-by: Adam Novak <anovak@soe.ucsc.edu> Co-authored-by: stxue1 <122345910+stxue1@users.noreply.github.com> Co-authored-by: Brandon Walker <43654521+misterbrandonwalker@users.noreply.github.com> Co-authored-by: Brandon Walker <walkerbd@dali1.dali.hpc.ncats.nih.gov> Co-authored-by: Glenn Hickey <glennhickey@users.noreply.github.com> Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael R. Crusoe <michael.crusoe@gmail.com> Co-authored-by: Lon Blauvelt <lblauvel@ucsc.edu> Co-authored-by: Alexandre Detiste <alexandre.detiste@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Andreas Tille <tille@debian.org> Co-authored-by: Theodore Ni <3806110+tjni@users.noreply.github.com> Co-authored-by: Benedict Paten <benedictpaten@gmail.com>
Closes #4354.
Changelog Entry
To be copied to the draft changelog by merger:
Reviewer Checklist
issues/XXXX-fix-the-thing
in the Toil repo, or from an external repo.camelCase
that want to be insnake_case
.docs/running/{cliOptions,cwl,wdl}.rst
Merger Checklist