[dask] add support for eval sets and custom eval functions #4101

ffineis · 2021-03-23T23:07:16Z

Followup PR regarding #3952 - implements eval_set functionality for lightgbm.dask but without early stopping.

This is implemented this to work with all eval-* parameters:

multiple eval sets (i.e. multiple (X, y) pairs in eval_set)
eval_names
eval_sample_weight
eval_class_weight (for DaskLGBMClassifier only)
eval_init_score
eval_group (for DaskLGBMRanker only)

When an individual eval_set, eval_sample_weight, eval_init_score, or eval_group is the same as (data, label)/sample_weight/init_score/group, just use the latter instead of having to compute the training set/weights/init_score/group multiple times.

This is all that's going on, making little mini eval sets out of delayed parts in a consistent manner:

Other things to know:

Raises warning when a worker does not receive any eval_set parts. This check is now performed prior to client.submit found in _train function. Model training still completes in this scenario, but depending on which worker returns its futures_classifier, best_score_ and evals_result_ attributes can be empty or contain data. Moreover, when a worker is missing eval_set entirely, this will fail out once early_stopping_rounds becomes supported - local worker calls to model.fit(..., eval_data=None, early_stopping_rounds=x) will throw a exception.
See comment as to why early_stopping_rounds is explicitly omitted from this PR.
This adds 32 tests! I'm happy to trim it down a little bit to 16.

_train_part model.fit args to lines Co-authored-by: James Lamb <jaylamb20@gmail.com>

_train_part model.fit args to lines, pt2 Co-authored-by: James Lamb <jaylamb20@gmail.com>

_train_part model.fit args to lines pt3 Co-authored-by: James Lamb <jaylamb20@gmail.com>

dask_model.fit args to lines Co-authored-by: James Lamb <jaylamb20@gmail.com>

Co-authored-by: James Lamb <jaylamb20@gmail.com>

use is instead of id() Co-authored-by: James Lamb <jaylamb20@gmail.com>

Co-authored-by: James Lamb <jaylamb20@gmail.com>

…cks. need to merge master - WiP

…support for eval_at for dask ranker

…ng to terminate too early

… counts

ffineis · 2021-06-20T04:17:05Z

I believe we should add the following note about custom eval function back to fit() method signature now.

LightGBM/python-package/lightgbm/sklearn.py

Line 268 in c738c83

_lgbmmodel_doc_custom_eval_note = """

Ah, you just mean copying the contents of this note, right?

Happy to duplicate. But could we just copy a link or say "see note for custom eval_metric functions in Sklearn API docs"?

StrikerRUS · 2021-06-20T20:56:03Z

@ffineis

Ah, you just mean copying the contents of this note, right?

For this PR I'm totally fine with just one line of a concatenation in Dask code like this one

LightGBM/python-package/lightgbm/sklearn.py

Line 722 in c7134fa

) + "\n\n" + _lgbmmodel_doc_custom_eval_note

Ideally, I think we can templatize it like other docstrings with shape types later.

Or is it OK to use wording array-like for Dask Arrays?
cc @jameslamb

ffineis · 2021-06-21T04:02:56Z

@ffineis

Ah, you just mean copying the contents of this note, right?

For this PR I'm totally fine with just one line of a concatenation in Dask code like this one

LightGBM/python-package/lightgbm/sklearn.py

Line 722 in c7134fa

) + "\n\n" + _lgbmmodel_doc_custom_eval_note

Ideally, I think we can templatize it like other docstrings with shape types later.

Or is it OK to use wording array-like for Dask Arrays?
cc @jameslamb

AH ok thanks, this makes sense. Addressed in 5d4ddc8 unless James thinks the custom eval note should be reformatted like _lgbmmodel_doc_fit in this PR (I think this makes sense for another follow-up, as it will mostly be adding changes to sklearn.py).

StrikerRUS

We are in progress of migrating to f-strings: #4136.

Also, during rendering and checking my current suggestions I noticed that there are no Returns sections for fit() methods of Dask estimators. Is it OK?

python-package/lightgbm/dask.py

jameslamb · 2021-06-22T22:21:10Z

Or is it OK to use wording array-like for Dask Arrays?

I think that's ok. If it causes confusion in the future we can make it more Dask-specific.

Also, during rendering and checking my current suggestions I noticed that there are no Returns sections for fit() methods of Dask estimators. Is it OK?

I guess they should have a return block similar to those in the equivalent scikit-learn estimators, but there are no return sections for e.g. DaskLGBMClassifier.fit() on master (https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.DaskLGBMClassifier.html#lightgbm.DaskLGBMClassifier.fit), so it's not something this PR needs to fix.

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

StrikerRUS · 2021-06-23T13:42:19Z

@jameslamb

I guess they should have a return block similar to those in the equivalent scikit-learn estimators ...

Created #4402.

StrikerRUS

@ffineis Thank you so much for all your hard work! Very important enhancement.
LGTM!

ffineis · 2021-06-23T20:50:47Z

@ffineis Thank you so much for all your hard work! Very important enhancement.
LGTM!

Thanks @StrikerRUS !! Appreciate the thorough vetting.

jameslamb

I did another review tonight, looks good to me! I noticed one thing but it's very small, so I'm going to approve / merge this and open a follow-up PR for it.

Thank you SO MUCH for your help with this very impactful contribution to the Dask interface.

github-actions · 2023-08-23T19:18:02Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

ffineis and others added 30 commits February 8, 2021 23:15

es WiP, need to add eval_sample_weight and eval_group

08430cb

add weight, group to dask es. WiP.

2b75814

dask es reorg

77766e9

Update python-package/lightgbm/dask.py

86341fc

_train_part model.fit args to lines Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update tests/python_package_test/test_dask.py

8d82646

_train_part model.fit args to lines, pt2 Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update python-package/lightgbm/dask.py

69110b4

_train_part model.fit args to lines pt3 Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update tests/python_package_test/test_dask.py

69b45dd

dask_model.fit args to lines Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update tests/python_package_test/test_dask.py

ee5f157

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update python-package/lightgbm/dask.py

c96be2f

use is instead of id() Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update python-package/lightgbm/dask.py

04040fc

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update python-package/lightgbm/dask.py

cf720c9

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update python-package/lightgbm/dask.py

c0d8ae6

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update tests/python_package_test/test_dask.py

5d19b50

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update tests/python_package_test/test_dask.py

4421736

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update python-package/lightgbm/dask.py

65f98e5

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update python-package/lightgbm/dask.py

6d14586

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Update python-package/lightgbm/dask.py

6737acd

Co-authored-by: James Lamb <jaylamb20@gmail.com>

applying changes to eval_set PR WiP

7099204

dask support for eval_names, eval_metric, eval_stopping_rounds

18cd42c

add evals_result checks and other eval_set attribute-related test che…

1335591

…cks. need to merge master - WiP

resolve conflix

c67cae5

fix lint errors in test_dask.py

609d8a0

drop group_shape from _lgbmmodel_doc_fit.format for non-rankers, add …

0102bc7

…support for eval_at for dask ranker

add eval_at to test_dask eval_set ranker tests

b03715a

add back group_shape to lgbmmmodel docs, tighten tests

dfb3c72

drop random eval weights from early stopping, probably causing traini…

82cf159

…ng to terminate too early

add eval data templates to sklearn fit docs, add eval data docs to dask

8341074

add n_features to _create_data, eval_set tests stop w/ desirable tree…

d232b0b

… counts

import alphabetically

e56251d

resolve merge conflix

10f6e58

ffineis added 2 commits June 17, 2021 14:09

make requested changes to test_dask.py

703e6f7

remove Optional() wrapper on eval_at

6da34f6

ffineis mentioned this pull request Jun 20, 2021

Use or return all workers eval_set evaluation data #4392

Closed

ffineis closed this Jun 20, 2021

ffineis reopened this Jun 20, 2021

ffineis mentioned this pull request Jun 20, 2021

Drop 'not evaluated' placeholder from dask.py #4393

Closed

ffineis added 3 commits June 20, 2021 22:23

Merge branch 'master' into dask/eval_sets

97c1bf2

add _lgbmmodel_doc_custom_eval_note to dask.py fit.__doc__

5d4ddc8

fix ordering of .sklearn imports to attempt lint fix

028727e

StrikerRUS reviewed Jun 21, 2021

View reviewed changes

python-package/lightgbm/dask.py Outdated Show resolved Hide resolved

python-package/lightgbm/dask.py Outdated Show resolved Hide resolved

python-package/lightgbm/dask.py Outdated Show resolved Hide resolved

jameslamb mentioned this pull request Jun 22, 2021

training eval_set does not default to "training" in Dask #4394

Open

ffineis and others added 3 commits June 22, 2021 22:33

dask custom eval note to f-string pt1

7d36e22

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

dask custom eval note to f-string pt 2

e493362

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

dask custom eval note to f-string pt 3

efb1825

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

StrikerRUS mentioned this pull request Jun 23, 2021

[python] Add Returns section to docstrings of fit() methods of Dask estimators #4402

Closed

StrikerRUS approved these changes Jun 23, 2021

View reviewed changes

jameslamb approved these changes Jun 28, 2021

View reviewed changes

jameslamb changed the title ~~[Dask] eval_sets~~ [dask] add support for eval sets and custom eval functions Jun 28, 2021

jameslamb merged commit b5502d1 into microsoft:master Jun 28, 2021

This was referenced Jun 28, 2021

[dask] fix typehint on _pad_eval_names() #4413

Merged

How to print best_iteration, best_score with lightgbm.DaskLGBMClassifier model? #4417

Closed

jameslamb mentioned this pull request Jul 28, 2021

[dask] early_stopping_rounds for dask #4493

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dask] add support for eval sets and custom eval functions #4101

[dask] add support for eval sets and custom eval functions #4101

ffineis commented Mar 23, 2021 •

edited

Loading

ffineis commented Jun 20, 2021

StrikerRUS commented Jun 20, 2021 •

edited

Loading

ffineis commented Jun 21, 2021

StrikerRUS left a comment

jameslamb commented Jun 22, 2021

StrikerRUS commented Jun 23, 2021

StrikerRUS left a comment

ffineis commented Jun 23, 2021

jameslamb left a comment

github-actions bot commented Aug 23, 2023

[dask] add support for eval sets and custom eval functions #4101

[dask] add support for eval sets and custom eval functions #4101

Conversation

ffineis commented Mar 23, 2021 • edited Loading

ffineis commented Jun 20, 2021

StrikerRUS commented Jun 20, 2021 • edited Loading

ffineis commented Jun 21, 2021

StrikerRUS left a comment

Choose a reason for hiding this comment

jameslamb commented Jun 22, 2021

StrikerRUS commented Jun 23, 2021

StrikerRUS left a comment

Choose a reason for hiding this comment

ffineis commented Jun 23, 2021

jameslamb left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 23, 2023

ffineis commented Mar 23, 2021 •

edited

Loading

StrikerRUS commented Jun 20, 2021 •

edited

Loading