Feature request: worker pool with `--dist=pool` #794

Garrett-R · 2022-07-12T17:34:06Z

Currently, it seems pytest-xdist divvies up all the work to be done and then the work gets executed. This would be faster if the workers just grabbed tasks out of a pool instead of divvying at the start.

Example:

# test_parallel.py

import sys
from time import sleep

# hack for live prints (https://stackoverflow.com/questions/27006884)
sys.stdout = sys.stderr


def test_1():
    print('test 1 (fast)')


def test_2():
    print('test 2 (slow)')
    sleep(5)


def test_3():
    print('test 3 (fast)')


def test_4():
    print('test 4 (slow)')
    sleep(5)

Running this with

pytest -s -n 2 test_parallel.py

takes 11.8s because supposedly, one worker gets unlucky and gets test_2 and test_4, while the other worker finishes almost immediately but then sits idle. Note: this repro may not be reliable since there's no guaranteed grouping - try shifting the functions around to repro if necessary.

This is an extreme example, but I also see this in my actual codebase I work on where I see 1 worker going for a non-negligle time after others have finished.

The text was updated successfully, but these errors were encountered:

Garrett-R · 2022-07-12T17:38:01Z

From the docs, this pool behavior is actually how I was expecting --dist load to work given it says: "Sends pending tests to any worker that is available". So perhaps this is not a feature request, but rather a bug?

RonnyPfannschmidt · 2022-07-12T18:15:37Z

There is a queue of tasks available for each worker, the scheduler was done with network usage in mind (10-15 years ago that was common usage by the pypy team)

It's simply the case that bundling, back-pressure and testcase with long run times have been integrated

There are numerous issues on the topic already

Garrett-R · 2022-07-12T18:25:14Z

I don't have enough context to fully follow that, although maybe you're not aiming that at me. 😆 Sorry if this is a dupe!

amezin · 2023-01-11T16:35:30Z

I belive this is now solved by #862 and --dist worksteal. It isn't exactly the same algorithm as described here, but it also does load [re-]balancing much more actively than the default scheduler.

BTW, the scheduling algorithm where workers just grab tests one by one when ready, without any order, likely wouldn't be the best for fixture reuse.

amezin mentioned this issue Dec 20, 2022

Implement work-stealing scheduler #858

Closed

amezin closed this as completed Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: worker pool with `--dist=pool` #794

Feature request: worker pool with `--dist=pool` #794

Garrett-R commented Jul 12, 2022 •

edited

Loading

Garrett-R commented Jul 12, 2022

RonnyPfannschmidt commented Jul 12, 2022

Garrett-R commented Jul 12, 2022

amezin commented Jan 11, 2023

Feature request: worker pool with --dist=pool #794

Feature request: worker pool with --dist=pool #794

Comments

Garrett-R commented Jul 12, 2022 • edited Loading

Garrett-R commented Jul 12, 2022

RonnyPfannschmidt commented Jul 12, 2022

Garrett-R commented Jul 12, 2022

amezin commented Jan 11, 2023

Feature request: worker pool with `--dist=pool` #794

Feature request: worker pool with `--dist=pool` #794

Garrett-R commented Jul 12, 2022 •

edited

Loading