Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests][dask] replace client fixture with cluster fixture #4159

Merged
merged 3 commits into from
Apr 5, 2021

Conversation

jmoralez
Copy link
Collaborator

@jmoralez jmoralez commented Apr 3, 2021

Motivation: #4088 (comment)

This replaces the client fixture (which creates and tears down a LocalCluster for every test) with a cluster fixture that creates a cluster once for all the tests and then every test creates a client based on it. This reduces the time it takes for test_dask to run significantly.

I replaced all the client.close(timeout=CLIENT_CLOSE_TIMEOUT) with a context manager for the client, so the diff seems awful but there's an option to hide whitespace changes in github which can be toggled in the little gears when reviewing the files. If this is too much I can just add a client = Client(cluster) at the top of every test instead, but I thought it'd be nice to unify all the tests to use this context manager.

I modified test_errors to use a regular cluster (instead of the async one) as well.

I also had to add a client.restart in test_machines_should_be_used_if_provided because that triggers an error on the worker which had local_listen_port already bound but the other one keeps waiting, so adding the restart resets that worker.

In test_error_on_feature_parallel_tree_learner I had to add a client.rebalance because the error wasn't always being triggered. I assume it's because the test data had two partitions and there was a chance that they both went to the same worker and we werent performing distributed training, which inspired #4149.

@jmoralez jmoralez requested a review from jameslamb as a code owner April 3, 2021 06:10
@jmoralez jmoralez closed this Apr 3, 2021
@jmoralez jmoralez reopened this Apr 3, 2021
@jmoralez
Copy link
Collaborator Author

jmoralez commented Apr 3, 2021

I just thought that this could also be achieved by defining a new client fixture that just creates a client from the cluster fixture defined here. This would involve minimal changes to the existing code (only the client that the tests receive would be different, but the code would stay the same). Should I go this way?

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for doing this!

It looks like this might save on the order of 16 minutes in our CI jobs, that is an AMAZING improvement. 🙌

Linux regular on master (logs)

===== 510 passed, 6 skipped, 2 xfailed, 435 warnings in 1221.07s (0:20:21) =====

Linux regular on this PR (logs)

===== 510 passed, 6 skipped, 2 xfailed, 340 warnings in 204.23s (0:03:24) ======

I'm +1 on this PR, just left one small suggestion. I've also provided some more answers and information below.


I just thought that this could also be achieved by defining a new client fixture that just creates a client from the cluster fixture defined here. This would involve minimal changes to the existing code (only the client that the tests receive would be different, but the code would stay the same). Should I go this way?

If this is just for the sake of code organization but wouldn't have any functional or performance benefit, I'd rather not. Using fixtures of fixtures makes me (maybe irrationally) nervous. I'm ok with the changes you've made in this PR, passing in a cluster and then creating a client from it. I'm also ok with using the client context manager to handle teardowns instead of Client.close().


In the future, when you open a PR or issue that comes from some previous discussion, please link that discussion. That helps anyone who finds this PR from search to find all of the relevant context, and helps reviewers who might not have been part of that original conversation to learn what they need to give an effective review.

In this case, I'm linking #3829 (comment).


You don't have to close and re-open PRs on GitHub to re-trigger CI. In most projects, you can just push an empty commit. In LightGBM we have a slight preference for that because a new commit means new builds, which means it is possible to view the logs from the previous builds, which can be useful in debugging. Some CI services overwrite builds if you close and re-open. This is why, for example, only the most recent build of this PR is currently available on GitHub Actions, even though there have been two.

image

You can push an empty commit like this:

git commit --allow-empty -m "empty commit"
git push tests/cluster-fixture

tests/python_package_test/test_dask.py Show resolved Hide resolved
tests/python_package_test/test_dask.py Show resolved Hide resolved
@jmoralez
Copy link
Collaborator Author

jmoralez commented Apr 4, 2021

If this is just for the sake of code organization but wouldn't have any functional or performance benefit, I'd rather not.

Yes, this was just to avoid the (seemingly) huge diff.

In the future, when you open a PR or issue that comes from some previous discussion, please link that discussion.

Yeah, that's my bad. I saw you do it on some previous PRs and forgot to do it here.

You don't have to close and re-open PRs on GitHub to re-trigger CI. In most projects, you can just push an empty commit. In LightGBM we have a slight preference for that because a new commit means new builds, which means it is possible to view the logs from the previous builds, which can be useful in debugging. Some CI services overwrite builds if you close and re-open. This is why, for example, only the most recent build of this PR is currently available on GitHub Actions, even though there have been two.

I usually do that but when I opened this it only triggered four actions (WIP, license/cla, continuous-integraction/appveyor/pr and Microsoft.LightGBM) and only one ran (WIP), the other three got stuck in "queued" which seemed weird. I didn't try the empty commit because I thought something had gone wrong when opening this PR (it took like 3 minutes for me to be able to see it, I thought I had missclicked somewhere). So I don't believe anything was lost in this case but will definitely try the empty commit first from now on.

@jameslamb
Copy link
Collaborator

You can safely ignore the R-package CI failures, they're not related to this PR's changes. They are part of #4160.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! Once #4161 is merged, the R tests should be fixed and we'll be able to merge this.

@jameslamb
Copy link
Collaborator

Ok @jmoralez , I think CI should be ok now. At your earliest convenience, can you update this to the latest master? Or if you give me permission, I can do that directly.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants