Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to execute certain tests sequentially while using -n (xdist) #385

Open
cauvery opened this issue Dec 5, 2018 · 33 comments
Open

Comments

@cauvery
Copy link

cauvery commented Dec 5, 2018

Hi,

I am looking for if there is a way to allow certain group of tests to run sequentially only while other tests continue to run in parallel using -n (xdist).

I have a group of tests that cannot be executed in parallel (small set of tests only in the whole). I do not want to create another job to run without -n for these small set. I searched but did not find actual solution anywhere.

The versions I am using are:
pytest-xdist 1.20.1
pytest 3.2.5
Python 2.7.13

Thanks in advance.

@cauvery cauvery changed the title ability to execute cetain tests sequentially while using -n (xdist) Ability to execute cetain tests sequentially while using -n (xdist) Dec 5, 2018
@cauvery cauvery changed the title Ability to execute cetain tests sequentially while using -n (xdist) Ability to execute certain tests sequentially while using -n (xdist) Dec 5, 2018
@RonnyPfannschmidt RonnyPfannschmidt transferred this issue from pytest-dev/pytest Dec 5, 2018
@Horstage
Copy link

Horstage commented Dec 5, 2018

A @pytest.mark would great for that purpose!

@RonnyPfannschmidt
Copy link
Member

currently there isn't enough metadata being transferred between the worker and the scheduler to facilitate this

@nicoddemus
Copy link
Member

Indeed we don't have a solution for that currently.

@cauvery what we do at work is to mark some tests with a known mark (@pytest.mark.serial) and then execute pytest twice: once with xdist excluding the mark, and then without xdist but including the mark:

$ pytest -n auto -m "not serial"
...
$ pytest -m "serial"

This way the serial-marked tests will execute sequentially in their own process. Just leaving this here in case you did not find that solution elsewhere. 👍

@cauvery
Copy link
Author

cauvery commented Dec 5, 2018

Thank you @nicoddemus, seen this solution, but here I need to create two separate jobs in Jenkins. One for parallel execution and one for sequential execution which I don't prefer in my project (having another job just for small subset).

Just FYI I can specify the two command lines in single job but the reports and result will be for the last command only.

-Cauvery.

@cauvery
Copy link
Author

cauvery commented Dec 5, 2018

Does anyone else have any alternate solution or workaround for this.

@akr8986
Copy link

akr8986 commented Dec 7, 2018

@cauvery

what is the kind of scheduling in xdist you have opted for??

Are the testcases which you want to execute sequentially spread across multiple different python test modules/packages??

@zoltan-fedor
Copy link

I have a similar need to @cauvery

In my case I have some integration tests (through @pytest.mark.parametrize) which are making modifications to a shared object and a fixture which always sets the state of that shared object to a known, initial state.

Unfortunately when these test cases end up running in a different thread, then they can "step on each other's toes".

It would be nice to have a marker in xdist, which would allow us to force some tests to run under the same thread, alias sequentially.

@cauvery
Copy link
Author

cauvery commented Dec 11, 2018

@akr8986 to answer question
Q: what is the kind of scheduling in xdist you have opted for??
A: I am using -n 3 in the command line. Does that answer the question/did I get the question wrong.

Q: Are the testcases which you want to execute sequentially spread across multiple different python test modules/packages??
A: Yes, they are spread across modules.

Thanks,
Cauvery.

@akr8986
Copy link

akr8986 commented Dec 12, 2018

@cauvery.. xdist supports 4 scheduling algorithms.. "each", "load", "loadscope", "loadfile".. now -n is a shortcut for load scheduling.. meaning the scheduler will load balance the tests across the workers..

If you take the case of loadfile scheduling, all testcases in a testfile will be executed sequentially by the same worker...

now from your requirement you will need something similar to loadFile but the scheduling will not be based on file but on some marker... i would suggest write a new scheduler all together.. Mark all the tests with a group name and distribute the tests having the same marker to a single workers and others to the rest of the workers..

Now the scheduler itself is not a plugin that can be installed in an environment.. perhaps @nicoddemus can comment on that..

@kdelee
Copy link

kdelee commented Mar 22, 2019

Over on my team we are trying to figure out if there is a way we could use pytest-xdist but we have similar blockers, and a home-grown tool was born of of those called https://github.com/ansible/pytest-mp . It does some nice things, such as allowing us to group tests that may be run together but not with other tests, or tests that need to be run isolated and in serial, but does not do other things as well such as testing multiple targets at once or spreading the test execution across multiple nodes.

It would be nice if we could benefit from the larger community that python-xdist enjoys, will think if there is a way we can get some of this functionality we rely on into xdist.

🤔

@neXussT
Copy link

neXussT commented Oct 23, 2019

If you're still considering this feature request, my company is looking for something where we can run some tests in parallel, but others sequentially due to resource sharing. This would be a very useful feature for pytest-xdist.

@atzorvas
Copy link

Isn't this enough?
py.test -m sequential
py.test -m "non sequential" -n8

@neXussT
Copy link

neXussT commented Oct 26, 2019

@atzorvas - I hoped it would be, but when I tried this, I ran into two severe problems with pytest-xdist, or how pytest works:

The latter is causing me a lot of problems, because it's creating collisions when I attempt to run tests on an AWS resource. If I have five processes, and they all configure the lambda I'm using at the same time, it fails. I have a session fixture to do this once, at the start of the test session. It must not be run more than once.

I'm pretty much stuck for finding a way to run tests in parallel. I'm likely going to have to execute jenkins runs in parallel, each with a subset of the tests. Not ideal. If anybody has a solution, I would love to investigate it.

@symonk
Copy link
Member

symonk commented Oct 31, 2019

for the session fixture / run once problem, you should write a lock file and have one process do the run-once type code after it creates the lock file, have other processes polling for the lock file to be deleted before they carry on. We do exactly this

@neXussT
Copy link

neXussT commented Nov 1, 2019

Thanks @symonk . That sounds like a pretty decent solution, but probably won't work for my problem. Since I am updating AWS Lambda environmental variables as a session fixture, each process would execute the same fixture, while other processes are finished, causing intermittent issues.

The way I've designed my tests doesn't seem compatible with multi-processor testing.

@nicoddemus
Copy link
Member

nicoddemus commented Nov 2, 2019

That sounds like a pretty decent solution, but probably won't work for my problem. Since I am updating AWS Lambda environmental variables as a session fixture, each process would execute the same fixture, while other processes are finished, causing intermittent issues.

Perhaps you can write the environment variables to the file protected by the lock, something like (untested):

from filelock import FileLock

@pytest.fixture(scope="session")
def my_session_fix(tmp_path_factory):
    if not worker_id:
        # not executing in multiple workers
        env_vars = create_env_vars()
    else:
        # get the temp directory shared for by all workers
        root_tmp_dir = tmp_path_factory.getbasetemp().parent
        f = root_tmp_dir / 'envs.json'
        with FileLock(str(f) + '.lock'):
            if not f.is_file():
                env_vars = create_env_vars()
                f.write_text(json.dumps(env_vars))
            else:
                env_vars = json.loads(f.read_text())

    os.environ.update(env_vars)

(This uses filelock)

Only one process each time will be able to get to the envs.json file: the first process creates the file with the environment variables encoded as a JSON file, the next processes will only read from the file.

(EDIT: my initial take did not really work, have fixed it now using the same approach as in #483).

EDIT: there's a documented version in the docs now: https://pytest-xdist.readthedocs.io/en/latest/how-to.html#making-session-scoped-fixtures-execute-only-once

nicoddemus added a commit to nicoddemus/pytest-xdist that referenced this issue Nov 2, 2019
nicoddemus added a commit to nicoddemus/pytest-xdist that referenced this issue Nov 2, 2019
@nicoddemus
Copy link
Member

nicoddemus commented Nov 2, 2019

I've opened #483 adding an example to the README, would appreciate reviews. 👍

nicoddemus added a commit to nicoddemus/pytest-xdist that referenced this issue Nov 2, 2019
@neXussT
Copy link

neXussT commented Nov 3, 2019

Thanks @nicoddemus. After looking at your code, and reviewing @symonk 's last post again, this could be a viable solution for me. I probably only have to put this file lock wrapper around the session, and module fixtures, probably with just a decorator.

@nicoddemus
Copy link
Member

Cool (I've update the example above again after realizing it was mixing code from #483)

@qwordy
Copy link

qwordy commented Aug 5, 2020

I also need this feature eagerly! The problem is that if I run pytest twice, I get two results.

$ pytest -n auto -m "not serial"
$ pytest -m "serial"

@aorestr
Copy link

aorestr commented Nov 6, 2020

This could be very cool indeed. I run some functional tests against a running system. Some of the tests impact very hardly on the system, so I'd need to isolatedly run those tests. A marker could be perfect :)

@brandon-leapyear
Copy link

brandon-leapyear commented Nov 6, 2021

This is an old work account. Please reference @brandonchinn178 for all future communication


Just wrote this patch; I think it works. It would be nice to have this provided out of the box. (It doesn't actually solve my particular use-case, as I need these serial tests to run without any other tests running in parallel, but it solves the problem in this ticket)

from xdist import is_xdist_controller
from xdist.scheduling import LoadScopeScheduling

def pytest_configure(config):
    config.pluginmanager.register(XDistSerialPlugin())

class XDistSerialPlugin:
    def __init__(self):
        self._nodes = None

    @pytest.hookimpl(tryfirst=True)
    def pytest_collection(self, session):
        if is_xdist_controller(session):
            self._nodes = {
                item.nodeid: item
                for item in session.perform_collect(None)
            }
            return True

    def pytest_xdist_make_scheduler(self, config, log):
        return SerialScheduling(config, log, nodes=self._nodes)


class SerialScheduling(LoadScopeScheduling):
    def __init__(self, config, log, *, nodes):
        super().__init__(config, log)
        self._nodes = nodes

    def _split_scope(self, nodeid):
        node = self._nodes[nodeid]
        if node.get_closest_marker("serial"):
            # put all `@pytest.mark.serial` tests in same scope, to
            # ensure they're all run in the same worker
            return "__serial__"

        # otherwise, each test is in its own scope
        return nodeid

@nicoddemus
Copy link
Member

nicoddemus commented Nov 6, 2021

I need these serial tests to run without any other tests running in parallel

We have the same requirement at work, but we solved it differently:

  1. We mark the tests that need to run serially with a @pytest.mark.serial mark.

  2. Execute pytest in parallel, excluding the tests with the mark:

    $ pytest -n auto -m "not serial"
    
  3. Execute pytest again serially, selecting only the marked tests:

    $ pytest -m "serial"
    

Just to complement that there's this alternative.

EDIT: just to not too that you can execute the two commands in the same job, you don't need separate jobs.

@vamotest
Copy link

vamotest commented Jun 10, 2022

Just wrote this patch; I think it works. It would be nice to have this provided out of the box. (It doesn't actually solve my particular use-case, as I need these serial tests to run without any other tests running in parallel, but it solves the problem in this ticket)

from xdist import is_xdist_controller
from xdist.scheduling import LoadScopeScheduling

def pytest_configure(config):
    config.pluginmanager.register(XDistSerialPlugin())

class XDistSerialPlugin:
    def __init__(self):
        self._nodes = None

    @pytest.hookimpl(tryfirst=True)
    def pytest_collection(self, session):
        if is_xdist_controller(session):
            self._nodes = {
                item.nodeid: item
                for item in session.perform_collect(None)
            }
            return True

    def pytest_xdist_make_scheduler(self, config, log):
        return SerialScheduling(config, log, nodes=self._nodes)


class SerialScheduling(LoadScopeScheduling):
    def __init__(self, config, log, *, nodes):
        super().__init__(config, log)
        self._nodes = nodes

    def _split_scope(self, nodeid):
        node = self._nodes[nodeid]
        if node.get_closest_marker("serial"):
            # put all `@pytest.mark.serial` tests in same scope, to
            # ensure they're all run in the same worker
            return "__serial__"

        # otherwise, each test is in its own scope
        return nodeid

@brandon-leapyear, thank you very much for this patch, it helped me a lot.
The only thing is that the imports have changed a little
from xdist.scheduling -> from xdist.scheduler

Used pytest==7.1.2 and Python 3.7.8

@ngyingkai
Copy link

Just wrote this patch; I think it works. It would be nice to have this provided out of the box. (It doesn't actually solve my particular use-case, as I need these serial tests to run without any other tests running in parallel, but it solves the problem in this ticket)

from xdist import is_xdist_controller
from xdist.scheduling import LoadScopeScheduling

def pytest_configure(config):
    config.pluginmanager.register(XDistSerialPlugin())

class XDistSerialPlugin:
    def __init__(self):
        self._nodes = None

    @pytest.hookimpl(tryfirst=True)
    def pytest_collection(self, session):
        if is_xdist_controller(session):
            self._nodes = {
                item.nodeid: item
                for item in session.perform_collect(None)
            }
            return True

    def pytest_xdist_make_scheduler(self, config, log):
        return SerialScheduling(config, log, nodes=self._nodes)


class SerialScheduling(LoadScopeScheduling):
    def __init__(self, config, log, *, nodes):
        super().__init__(config, log)
        self._nodes = nodes

    def _split_scope(self, nodeid):
        node = self._nodes[nodeid]
        if node.get_closest_marker("serial"):
            # put all `@pytest.mark.serial` tests in same scope, to
            # ensure they're all run in the same worker
            return "__serial__"

        # otherwise, each test is in its own scope
        return nodeid

@brandon-leapyear, thank you very much for this patch, it helped me a lot. The only thing is that the imports have changed a little from xdist.scheduling -> from xdist.scheduler

Used pytest==7.1.2 and Python 3.7.8

@vamotest Sorry but how do you apply the patch? I tried doing it by running the patch command in xdist folder but doesn't seem to work?

@vamotest
Copy link

vamotest commented Jul 7, 2022

@ngyingkai

It is necessary to put the above piece of code in conftest.py

And mark the tests themselves

@pytest.mark.serial
def test_my_awesome_serial():
    pass
    
def my_parallel_test():
    pass

And run

PYTHONHASHSEED=0 python3 -m pytest -n 4 --alluredir=allure_data

Then all serial tests will be run on one worker, and the rest in parallel on others.

I'm fine with integration tests, but for unit with the same scheme I get an error Unexpectedly no active workers available
It seems from the point of view of the test run, everything is fine launch, but in job'e there is a drop from worker_internal_error

From what I managed to unwind in traceback (full traceback)

tests_1        | INTERNALERROR> E                 return self._hookexec(self, self.get_hookimpls(), kwargs)
tests_1        | INTERNALERROR> E               File "/usr/local/airflow/.local/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
tests_1        | INTERNALERROR> E                 return self._inner_hookexec(hook, methods, kwargs)
tests_1        | INTERNALERROR> E               File "/usr/local/airflow/.local/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
tests_1        | INTERNALERROR> E                 firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
tests_1        | INERNALERROR> E               File "/usr/local/airflow/.local/lib/python3.7/site-packages/pluggy/callers.py", line 208, in _multicall
tests_1        | INTERNALERROR> E                 return outcome.get_result()
...
tests_1        | =========== 3459 passed, 85 skipped, 7 warnings in 310.54s (0:05:10) ===========
tests_1        | RESULT_CODE=1

Perhaps it rests on the firstresult of the hook and loop_once

I found a similar one issue,but it seems to have been decided in anotherissue back in 2019. At the same time, I have the latest version of the pytest/pytest-xdist

Perhaps you can help, please?
@brandon-leapyear brandon-leapyear
or
@nicoddemus nicoddemus
I don't want to run serial tests first, and then parallel ones. Or create a separate stage for each of them

@felixmeziere
Copy link

Badly need this too!

@WilliamDEdwards
Copy link

Use --dist=loadgroup (introduced in 2.5.0).

This allows you to run tests in parallel by default, and run specifically marked tests serially.

From https://pytest-xdist.readthedocs.io/en/latest/distribution.html:

[...] guarantees that all tests with same xdist_group name run in the same worker. Tests without the xdist_group mark are distributed normally as in the --dist=load mode.

The example below runs test_banana and test_apple in the same worker. The other tests are run as usual, i.e. they are distributed across workers.

import pytest

@pytest.mark.xdist_group(name="fruit")
def test_banana():
	print('banana')

@pytest.mark.xdist_group(name="fruit")
def test_apple():
	print('apple')

def test_broccoli():
	print('broccoli')

def test_carrot():
	print('carrot')

def test_mushroom():
	print('mushroom')

def test_fungus():
	print('fungus')

@themperek
Copy link

@WilliamDEdwards

Use --dist=loadgroup (introduced in 2.5.0).
This allows you to run tests in parallel by default, and run specifically marked tests serially.

But how this should control the order of execution? As I can see this only controls the place/worker of execution?

@WilliamDEdwards
Copy link

WilliamDEdwards commented Nov 9, 2022

@WilliamDEdwards

Use --dist=loadgroup (introduced in 2.5.0).
This allows you to run tests in parallel by default, and run specifically marked tests serially.

But how this should control the order of execution? As I can see this only controls the place/worker of execution?

One worker can process one test at a time. If specific tests are executed by one worker, they are run serially by definition.

WenjieDu added a commit to WenjieDu/PyPOTS that referenced this issue Mar 31, 2023
…ute tasks sequentially;

Some test tasks need to be executed sequentially, but we're using pytest-dist to accelerate the testing precess. To solve this problem, refer to pytest-dev/pytest-xdist#385 (comment). And please note that it need pytest-dist >= v2.5.0.
MaciejSkrabski added a commit to MaciejSkrabski/PyPOTS that referenced this issue Apr 11, 2023
* feat: enable auto reply on PRs created by new contributors;

* feat: simplify requirements to speed up the installation process of PyPOTS;

* feat: remove torch_geometric from the setup file as well to speed up installation;

* doc: update README to add the usage example;

* feat: print all outputs during test with pytest;

* feat: add MANIFEST.in to remove the test dir from the released package;

* fix: the bug of separating the code-coverage report;

* fix: capture the error caused by singular matrix existence in VaDER;

* doc: update the documentation;

* doc: add the doc of all implemented modules;

* fix: add the dependencies of PyPOTS into the doc building requirement file;

* doc: update README;

* feat: add the lazy-loading strategy for BaseDataset;

* doc: update README;

* feat: add limitations on lib dependencies;

* feat: add class Logger to help present logs better;

* feat: replace print with logger;

* feat: add the func create_dir_if_not_exist() in pypots.utils.files;

* fix: TypeError when using logger with mistake;

* refactor: update the logger;

* feat: add the test cases for logging;

* feat: add the attribute __all__ into __init__ files;

* doc: update README;

* feat: add the file lazy-loading strategy for classes derived from BaseDataset;

* doc: fix the reference ;

* fix: update the dependencies;

* doc: update README to add pypots installation with conda;

* feat: separate the input data assembling functions of training, validating, and testing stages;

* doc: update the reference info;

* fix: imputation models applying MIT do not need use DatasetForMIT on val_set;

* fix: only import h5py when needed;

* feat: move check_input() to BaseDataset;

* fix: correct mistaken operator from & to ^;

* fix: turn imputation to numpy.ndarray in the validation stage;

* feat: update the data given and input logic to support loading dataset from files;

* fix: bugs in Dataset classes' functions with lazy-loading strategy;

* fix: update the dependencies;

* feat: add testing cases for lazy-loading datasets;

* doc: update README;

* feat: v0.0.10 is ready;

* fix: running testing cases for forecasting models and lazy-loading datasets;

* fix: running testing cases for logging;

* fix: try to fix the BlockingIOError, see below message for details;

BlockingIOError: [Errno 35] Unable to create file (unable to lock file, errno = 35, error message = 'Resource temporarily unavailable')
This may be caused by the program creates h5 files for multiple times;

* refactor: test scripts;

* fix: use annotation @pytest.mark.xdist_group to help pytest-dist execute tasks sequentially;

Some test tasks need to be executed sequentially, but we're using pytest-dist to accelerate the testing precess. To solve this problem, refer to pytest-dev/pytest-xdist#385 (comment). And please note that it need pytest-dist >= v2.5.0.

* fix: fix some warnings while running VaDER;

* fix: move dataset saving into test steps;

* fix: the error file name of test_data.py;

* doc: update the documentation;

* doc: update the documentation;

* Merge `dev` into `main` to update the documentation and add doc-generating shell scripts (WenjieDu#40)

* doc: update the documentation;

* doc: update the documentation;

* refactor: preprocessing functions of specific dataset now move to module load_preprocessing;

* fix: solve the problem of circular import;

moved the functions of parsing delta to util.py.

* refactor: don't save data into h5 files if the datasets already exit;

* feat: add issue templates of bug report, feature request, and model addition;

* Add issue templates (WenjieDu#41)

* doc: update the documentation;

* doc: update the documentation;

* refactor: preprocessing functions of specific dataset now move to module load_preprocessing;

* fix: solve the problem of circular import;

moved the functions of parsing delta to util.py.

* refactor: don't save data into h5 files if the datasets already exit;

* feat: add issue templates of bug report, feature request, and model addition;

* feat: turn the given device (str or torch.device) into torch.device;

* feat: enable save training logs into `tb_file_saving_path` in BaseModel, need to be inherited in model implementations;

tb_file_saving_pathi is the path to save the tensorboard file, which contains the loss values recorded during training.

* feat: enable set num_workers of DataLoader and typing annotation;

* feat: add typing annotations in the functions in `data` and `utils`;

* feat: add python version 3.11 of all three platforms in the testing workflow;

* fix: numpy.float is deprecated;

* Decrease testing python version 3.11 to 3.10, and remove fixed dependency versions (WenjieDu#43)

* fix: remove fixed dependency versions;

* fix: lower python version from 3.11 to 3.10;

* fix: lower python version from 3.11 to 3.10;

* feat: add pytorch-scatter;

* fix: fine with python 3.10, try to upgrade to 3.11 again;

* fix: remove pyg dependencies to see if conda parsing error with python3.11 is caused by pyg;

* fix: decrease python version back to 3.10;

* feat: add daily testing workflow;

* feat: make imputation models val_X_intact and val_indicating_mask should be included in val_set;

* fix: invalid  attribute;

* fix: invalid `cron` attribute, 7 is not standard, should use 0 to represent Sunday;

* doc: update README, split the table of the available algos according to the task;

* refactor: move gene_incomplete_random_walk_dataset and gene_physionet2012 to data.generating, delete tests/unified_data_for_test;

* fix: correct the mistaken path to environment_for_pip_test.txt;

* fix: fix error the caused by renaming file `test_logging` to `test_utils`, and rename the workflow name from `Tests` to `CI`;

* feat: remove `pull_request` trigger to avoid duplicate CI running;

---------

Co-authored-by: Wenjie Du <wenjay.du@gmail.com>
@maxdml
Copy link

maxdml commented Feb 16, 2024

Would be great being able to - without writing a custom scheduler - order a set of concurrent tests and a set of sequential tests (for instance, have the sequential tests run before the // tests, and vice versa)

@PrimeQA-Dev
Copy link

We cannot make "@pytest.mark.xdist_group(name="fruit")" via pytest_collection_modifyitems dynamically. Hence to overcome this we can execute the test in 2 subprocess like:

Run sequential tests on a single worker

seq_process = subprocess.Popen(["pytest", "-n", "1", "-m", "sequential"])

Run parallel tests on multiple workers

par_process = subprocess.Popen(["pytest", "-n", "3", "-m", "parallel"])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests