Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] fix teardown issues in Dask tests (fixes #3829) #3869

Merged
merged 46 commits into from
Jan 29, 2021

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Jan 27, 2021

See #3829 (comment) for an explanation of this pull request.

The tests results below should builds I used to try to establish how often this teardown timeout issue happens. See #3829 (comment) for a summary.

This PR proposes just increasing the timeout on the client.close() calls used in Dask tests, to 60 seconds. It also adds client.close() calls to two tests that were missing them.


Test runs with no changes to anything in python-package/ or tests/

sdist (5 Linux, 5 Linux_latest per build, Python=3.8)

build 1: ✔️ 10, ❌ 0
build 2: ✔️ 10, ❌ 0
build 3: ✔️ 10, ❌ 0
build 4: ✔️ 10, ❌ 0
build 5: ✔️ 10, ❌ 0

bdist (5 Linux, 5 Linux_latest per build, Python=3.8)

build 1: ✔️ 10, ❌ 0
build 2: ✔️ 9, ❌ 1

Linux bdist timeout failure (1)

build 3: ✔️ 10, ❌ 0
build 4: ✔️ 10, ❌ 0
build 5: ✔️ 10, ❌ 0

regular (5 Linux, 5 Linux_latest per build, Python=3.8)

build 1: ✔️ 10, ❌ 0
build 2: ✔️ 8, ❌ 2

Linux regular timeout failure (2)

build 3: ✔️ 10, ❌ 0
build 4: ✔️ 10, ❌ 0
build 5: ✔️ 10, ❌ 0

bdist (5 Linux, 5 Linux_latest per build, Python=3.7)

build 1: ✔️ 10, ❌ 0
build 2: ✔️ 10, ❌ 0
build 3: ✔️ 10, ❌ 0
build 4: ✔️ 9, ❌ 1

Linux bdist timeout failure (1)

build 5: ✔️ 10, ❌ 0

regular (5 Linux, 5 Linux_latest per build, Python=3.7)

build 1: ✔️ 10, ❌ 0
build 2: ✔️ 10, ❌ 0
build 3: ✔️ 10, ❌ 0
build 4: ✔️ 10, ❌ 0
build 5: ✔️ 10, ❌ 0

@jameslamb jameslamb changed the title WIP: [dask] fix teardown issues in Dask tests (fixes #3829) [dask] fix teardown issues in Dask tests (fixes #3829) Jan 28, 2021
@jameslamb jameslamb marked this pull request as ready for review January 28, 2021 21:44
@jameslamb jameslamb requested a review from StrikerRUS January 28, 2021 21:44
@jameslamb
Copy link
Collaborator Author

Based on #3829 (comment), I believe this is ready to review.

@@ -172,7 +176,7 @@ def test_classifier(output, centers, client, listen_port):
assert_eq(p1_local, p2)
assert_eq(y, p1_local)

client.close()
client.close(timeout=CLIENT_CLOSE_TIMEOUT)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can wrap client as a resource-like fixture?

@pytest.fixture()
def resource():
    print("setup")
    yield "resource"
    print("teardown")

https://stackoverflow.com/a/39401087

It will reduce code duplication and help to avoid cases where we forget to add this code line.
#3829 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client we import from distributed is already itself a pytest fixture: https://github.com/dask/distributed/blob/b5c36b587f0e3295fe59db330603d5c78b39331f/distributed/utils_test.py#L543-L547

I'm really nervous about adding a second layer of @pytest.fixture() on top of this, when this PR's goal is specifically to fix an error caused by instability in teardowns. I'm worried that will expose us to new types of problems related to having fixtures-of-fixtures.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for all your hard work!

@jameslamb
Copy link
Collaborator Author

jameslamb commented Jan 29, 2021

Ok thanks for the review and conversation in #3829 ! I'm going to merge this, let's see if it helps.

@jameslamb jameslamb merged commit 42d1633 into master Jan 29, 2021
@jameslamb jameslamb deleted the fix/dask-teardown branch January 29, 2021 00:14
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants