Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] reduce test times #3786

Merged
merged 2 commits into from
Jan 18, 2021
Merged

Conversation

jameslamb
Copy link
Collaborator

This PR proposes a refactoring of tests/python_package_test/test_dask.py, to reduce the runtime of the Dask tests.

  • makes models smaller by setting n_estimators=10 and num_leaves=10 (the defaults are n_estimators=100 and num_leaves=31)
  • adds the predict_proba() tests for DaskLGBMClassifier into the existing test_classifier test. I don't think effectively testing predict() and predict_proba() requires separate training runs

Impact of this change

I tried running pytest tests/python_package_test/test_dask.py 5 times on master and 5 times on this branch (on my laptop). I found that the changes in this PR reduce the runtime of the Dask tests by around 35 seconds.

I think this speedup will have a meaningful impact on development speed for the Dask module.

before (avg = 137s):

============================ 29 passed, 2 warnings in 150.19s (0:02:30) =============================
============================= 29 passed, 1 warning in 140.21s (0:02:20) =============================
============================ 29 passed, 2 warnings in 132.08s (0:02:12) =============================
============================ 29 passed, 2 warnings in 132.68s (0:02:12) =============================
============================ 29 passed, 2 warnings in 133.04s (0:02:13) =============================

after combining plus less iterations (avg = 102s):

============================= 23 passed, 1 warning in 101.96s (0:01:41) =============================
============================= 23 passed, 1 warning in 103.60s (0:01:43) =============================
============================= 23 passed, 1 warning in 101.74s (0:01:41) =============================
============================= 23 passed, 1 warning in 101.34s (0:01:41) =============================
============================= 23 passed, 1 warning in 102.70s (0:01:42) =============================

Notes for Reviewers

  1. I intentionally did not change n_estimators in the test_regression test. That test requires a lot of iterations from distributed and local training to look similar, and I found that they produced very different results for small values of n_estimators.
  2. If this PR is accepted, I'll add a review comment with similar suggestions on [python-package] [dask] Add DaskLGBMRanker #3708

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks! I find this useful given the growing number of CI jobs and services.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants