[dask] Support pred_contrib in Dask predict() methods (fixes #3713) #3774

jameslamb · 2021-01-17T06:15:01Z

The predict() method on Dask model objects doesn't correctly handle pred_contrib=True today. It fails to return the full matrix of feature contributions.

Thanks to @pseudotensor for pointing out this bug (#3713).

This PR fixes that.

Notes for Reviewers

I found that the results of predict(pred_contrib=True) are different between the Dask interface and sklearn model objects, if n_workers in the Dask cluster is greater than 1. I observed that both feature contribution values and the "base value" in the pred_contrib output are often different. This is true for regression, binary classification, and multi-class classification. The differences are larger than what I think could be attributed to numeric precision issues.

@guolinke , should I expect that pred_contrib outputs are different between multi-machine training and single-machine training? I'm unsure if this is because of an issue in the Dask interface or if it's something that is LightGBM-wide.

References

I found these conversations useful while working through this:

tests/python_package_test/test_dask.py

StrikerRUS · 2021-01-18T22:40:11Z

@jameslamb

The predict() method on Dask model objects doesn't correctly handle pred_contrib=True today.

Just to clarify: are raw_score=True and pred_leaf=True fully supported now? If yes, I think we need tests, if no then feature requests.

LightGBM/python-package/lightgbm/sklearn.py

Lines 871 to 880 in d2c5545

    
           def predict(self, X, raw_score=False, start_iteration=0, num_iteration=None, 
        
                       pred_leaf=False, pred_contrib=False, **kwargs): 
        
               """Docstring is inherited from the LGBMModel.""" 
        
               result = self.predict_proba(X, raw_score, start_iteration, num_iteration, 
        
                                           pred_leaf, pred_contrib, **kwargs) 
        
               if callable(self._objective) or raw_score or pred_leaf or pred_contrib: 
        
                   return result 
        
               else: 
        
                   class_index = np.argmax(result, axis=1) 
        
                   return self._le.inverse_transform(class_index)

python-package/lightgbm/dask.py

StrikerRUS

Thanks for so actively improving Dask module!
Please check my minor comments below.

python-package/lightgbm/dask.py

StrikerRUS · 2021-01-19T18:01:44Z

tests/python_package_test/test_dask.py

+    dask_classifier = dlgbm.DaskLGBMClassifier(
+        time_out=5,
+        local_listen_port=listen_port,
+        tree_learner='data'
+    )


Add n_estimators=10 and num_leaves=10?
#3786.

oh good idea

added in 6428589

ok I just added this again (was lost because of a bad merge conflict resolution, sorry)

StrikerRUS · 2021-01-19T18:08:11Z

tests/python_package_test/test_dask.py

+    else:
+        expected_num_cols = (dX.shape[1] + 1) * num_classes
+
+    if isinstance(dX, dask.dataframe.core.DataFrame):


I might be wrong, but according to docs and sources, we can use it without core part to not depend on inner implementation.
https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame
https://github.com/dask/dask/blob/72304a94c98ace592f01df91e3d9e89febda307c/dask/dataframe/__init__.py#L3

Suggested change

if isinstance(dX, dask.dataframe.core.DataFrame):

if isinstance(dX, dask.dataframe.DataFrame):

ooo that's a good idea, let me try that

added in 6428589 and it worked ok

ok added this back again (now as dd.DataFrame)

StrikerRUS · 2021-01-19T18:09:05Z

tests/python_package_test/test_dask.py

+    dask_regressor = dlgbm.DaskLGBMRegressor(
+        time_out=5,
+        local_listen_port=listen_port,
+        tree_learner='data'
+    )


Add n_estimators=10 and num_leaves=10?
#3786.

added in 6428589

added back again

jameslamb · 2021-01-19T20:58:06Z

Just to clarify: are raw_score=True and pred_leaf=True fully supported now? If yes, I think we need tests, if no then feature requests.

I spent about an hour today trying to get the tests for raw_score and pred_leaf working. Both are only somewhat supported (for some combinations of task and input type). I faced several errors related to what I think is one root cause: when using map_blocks() or map_partitions(), you have to provide some metadata to help Dask understand the shape of the function call result. This is described in the meta param of https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.map_partitions, for example.

I'll write up feature requests and link them here and in #2302. I'm not sure if it's that "raw_score and pred_leaf are not supported", or more if I was just making mistakes in the tests.

So as of this PR, those parameters will be understood by predict(), but whether or not it succeeds won't be guaranteed.

UPDATE: added #3792 and #3793

StrikerRUS · 2021-01-20T16:39:20Z

UPDATE: added #3792 and #3793

Thank you so much!

StrikerRUS

Seems there were some merge conflicts because some my previous comments are not addressed but you said they were.

tests/python_package_test/test_dask.py

python-package/lightgbm/dask.py

tests/python_package_test/test_dask.py

jameslamb · 2021-01-21T04:50:22Z

Seems there were some merge conflicts because some my previous comments are not addressed but you said they were.

so weird! Yeah, maybe a bad resolution of a merge conflict, sorry.

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

…into fix/dask-predcontrib

tests/python_package_test/test_dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

StrikerRUS

Thanks, LGTM!

I'm afraid merging this PR will cause conflicts with #3708, so I'm not touching anything.

jameslamb · 2021-01-22T05:35:25Z

now that #3708 has been merged, I'll fix merge conflicts here and then merge this

StrikerRUS · 2021-01-23T20:19:42Z

@jameslamb Can we remove this strange branch?

jameslamb · 2021-01-23T20:22:58Z

yep definitely! I just deleted it, sorry

StrikerRUS · 2021-01-29T19:06:44Z

@jameslamb
Ah, it is here again! 😄

Can I remove it?

jameslamb · 2021-01-29T19:09:11Z

WHAT I'm so confused. Yes please remove it, I'm sorry. maybe my remotes are set up wrong on some local clone.

StrikerRUS · 2021-01-29T19:15:45Z

No problem, thanks!

maybe my remotes are set up wrong on some local clone.

Perhaps... It is 2281 commits behind master, should be VERY old setup. 🙂

github-actions · 2023-08-24T01:27:55Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added 2 commits January 15, 2021 17:28

adding pred_contrib support

e9a6b76

add tests

a878417

jameslamb added the feature label Jan 17, 2021

jameslamb requested review from guolinke and StrikerRUS January 17, 2021 06:15

linting

8447651

StrikerRUS reviewed Jan 18, 2021

View reviewed changes

tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved

jameslamb added 2 commits January 18, 2021 09:34

Merge branch 'master' into fix/dask-predcontrib

395f39a

remove raw_score

440a30b

jameslamb requested a review from StrikerRUS January 18, 2021 18:08

Merge branch 'master' into fix/dask-predcontrib

bff2cea

StrikerRUS reviewed Jan 18, 2021

View reviewed changes

python-package/lightgbm/dask.py Outdated Show resolved Hide resolved

jameslamb added 3 commits January 18, 2021 20:32

Merge branch 'master' into fix/dask-predcontrib

b172762

add pred kwargs

98a7878

Merge branch 'master' into fix/dask-predcontrib

4397022

StrikerRUS reviewed Jan 19, 2021

View reviewed changes

faster tests

6428589

This was referenced Jan 19, 2021

[dask] support 'pred_leaf' in predict() #3792

Closed

[dask] support 'raw_score' in predict() #3793

Closed

jameslamb requested a review from StrikerRUS January 19, 2021 21:23

Merge branch 'master' into fix/dask-predcontrib

82f8c55

StrikerRUS requested changes Jan 20, 2021

View reviewed changes

jameslamb and others added 3 commits January 20, 2021 23:28

Apply suggestions from code review

90f0c2d

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

changes to tests

b4ddfbf

Merge branch 'fix/dask-predcontrib' of github.com:jameslamb/LightGBM …

c76148f

…into fix/dask-predcontrib

StrikerRUS reviewed Jan 21, 2021

View reviewed changes

tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved

Update tests/python_package_test/test_dask.py

9b00ef5

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

StrikerRUS approved these changes Jan 21, 2021

View reviewed changes

merge master

402af15

jameslamb merged commit d9a96c9 into microsoft:master Jan 22, 2021

jameslamb deleted the fix/dask-predcontrib branch January 22, 2021 06:33

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dask] Support pred_contrib in Dask predict() methods (fixes #3713) #3774

[dask] Support pred_contrib in Dask predict() methods (fixes #3713) #3774

jameslamb commented Jan 17, 2021

StrikerRUS commented Jan 18, 2021

StrikerRUS left a comment

StrikerRUS Jan 19, 2021

jameslamb Jan 19, 2021

jameslamb Jan 19, 2021

jameslamb Jan 21, 2021

StrikerRUS Jan 19, 2021

jameslamb Jan 19, 2021

jameslamb Jan 19, 2021

jameslamb Jan 21, 2021

StrikerRUS Jan 19, 2021

jameslamb Jan 19, 2021

jameslamb Jan 21, 2021

jameslamb commented Jan 19, 2021 •

edited

Loading

StrikerRUS commented Jan 20, 2021

StrikerRUS left a comment

jameslamb commented Jan 21, 2021

StrikerRUS left a comment

jameslamb commented Jan 22, 2021

StrikerRUS commented Jan 23, 2021

jameslamb commented Jan 23, 2021

StrikerRUS commented Jan 29, 2021

jameslamb commented Jan 29, 2021

StrikerRUS commented Jan 29, 2021

github-actions bot commented Aug 24, 2023

	if isinstance(dX, dask.dataframe.core.DataFrame):
	if isinstance(dX, dask.dataframe.DataFrame):

[dask] Support pred_contrib in Dask predict() methods (fixes #3713) #3774

[dask] Support pred_contrib in Dask predict() methods (fixes #3713) #3774

Conversation

jameslamb commented Jan 17, 2021

Notes for Reviewers

References

StrikerRUS commented Jan 18, 2021

StrikerRUS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameslamb commented Jan 19, 2021 • edited Loading

StrikerRUS commented Jan 20, 2021

StrikerRUS left a comment

Choose a reason for hiding this comment

jameslamb commented Jan 21, 2021

StrikerRUS left a comment

Choose a reason for hiding this comment

jameslamb commented Jan 22, 2021

StrikerRUS commented Jan 23, 2021

jameslamb commented Jan 23, 2021

StrikerRUS commented Jan 29, 2021

jameslamb commented Jan 29, 2021

StrikerRUS commented Jan 29, 2021

github-actions bot commented Aug 24, 2023

jameslamb commented Jan 19, 2021 •

edited

Loading