Make it possible to parallelize tests #120

jdidion · 2020-02-18T19:07:30Z

In theory we should be able to use pytest-xdist to parallelize tests. In practice this doesn't work for a few reasons:

If multiple tests use the same test data, the data files can get corrupted if multiple tests are trying to download the same file at the same time
There could be resource contention when running locally

When running tests in parallel, it is safest to use per-test cache directories (i.e. don't use cache_dir config setting or PYTEST_WDL_CACHE_DIR environment variable). But we could also make test data threadsafe by processing all localization via a daemon OR have all files localized prior to running any tests.

Regarding resource contention, there is not a really good solution yet. There is an issue in pytest-xdist that has been open for years to address this. One approach is to mark tests that cannot be parallelized and then run two separate test sessions - the first excludes non-parallelizable tests and uses xdist, the second only runs non-parallelizable tests serially.

We could also borrow code from pytest-workflow, which implements it's own parallelization strategy.

patmagee · 2020-02-19T19:42:31Z

it might make the most sense to view local test runners (miniwdl/cromwell) as non-parallelizable, and then anything that is remote (dxWdl, Cromwell Server) can be run in parallel. Since these tests generally run in docker, it will be hard to run them in parallel without starving the underlying system of resources

rhpvorderman · 2020-07-27T10:25:10Z

In pytest-workflow the tests are not parallelized. The runnning of the workflows however is parallelized. The workflows take much more time than the tests. Maybe the model of pytest-workflow coud be utilized to achieve better wall clock times for pytest-wdl as well?

What was done in pytest-workflow is that each workflow gets its own runner object by instantiating from a Workflow class. These objects are added to a queue. The queue is then processed in parallel (default=1 thread) by invoking the run method of the objects.

Pytest has a hook pytest_runtestloop which runs after all the collection has completed but before any tests are run. At this moment all the workflows can be started and finished.
After it's done all the tests can be run on the completed workflows.

Hope this helps!

jdidion added blocked enhancement New feature or request parked Good idea, won't be implemented any time soon and removed blocked labels Feb 18, 2020

jdidion added this to the parked milestone Feb 25, 2020

jdidion removed the parked Good idea, won't be implemented any time soon label Feb 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible to parallelize tests #120

Make it possible to parallelize tests #120

jdidion commented Feb 18, 2020 •

edited

Loading

patmagee commented Feb 19, 2020

rhpvorderman commented Jul 27, 2020

Make it possible to parallelize tests #120

Make it possible to parallelize tests #120

Comments

jdidion commented Feb 18, 2020 • edited Loading

patmagee commented Feb 19, 2020

rhpvorderman commented Jul 27, 2020

jdidion commented Feb 18, 2020 •

edited

Loading