[tests][dask] fix argument names in custom eval function in Dask test #4833

StrikerRUS · 2021-11-28T19:20:42Z

RTD builds for visual checks: https://lightgbm.readthedocs.io/en/note_shape/.

jameslamb

Thanks for looking into this! But I don't think the proposed change is correct.

When using, for example, DaskLGBMRegressor, lightgbm.dask will create one task running LGBMRegressor.fit() on each Dask worker.

LightGBM/python-package/lightgbm/dask.py

Lines 765 to 767 in 2f5d898

    
           futures_classifiers = [ 
        
               client.submit( 
        
                   _train_part,

LightGBM/python-package/lightgbm/dask.py

Line 164 in 2f5d898

def _train_part(

LightGBM/python-package/lightgbm/dask.py

Lines 319 to 329 in 2f5d898

    
           model.fit( 
        
               data, 
        
               label, 
        
               sample_weight=weight, 
        
               init_score=init_score, 
        
               eval_set=local_eval_set, 
        
               eval_sample_weight=local_eval_sample_weight, 
        
               eval_init_score=local_eval_init_score, 
        
               eval_names=local_eval_names, 
        
               **kwargs 
        
           )

At that moment, that evaluation function will be called on only local data in each worker process, which will be numpy arrays, not Dask arrays.

Here's some sample code proving that. Note that it includes an assert which will fail if y_true is anything other than a numpy array.

(sample code)

import dask.array as da
import numpy as np
from dask.distributed import Client, LocalCluster, wait
from lightgbm.dask import DaskLGBMRegressor
from sklearn.datasets import make_regression

# set  up Dask cluster
n_workers = 2
cluster = LocalCluster(n_workers=n_workers)
client = Client(cluster)
client.wait_for_workers(n_workers)
print(f"View the dashboard: {cluster.dashboard_link}")

# create training data and an additional eval set
def _make_dataset(n_samples):
    X, y = make_regression(n_samples=n_samples)
    dX = da.from_array(X, chunks=(1000, X.shape[1]))
    dy = da.from_array(y, chunks=1000)
    return dX, dy

# training data
dX, dy = _make_dataset(10_000)

# eval data
dX_e, dy_e = _make_dataset(2_000)

def _custom_metric(y_true, y_pred):
    metric_name = "custom_metric"
    is_higher_better = False
    metric_value = 0.708
    assert str(type(y_true)) == "<class 'numpy.ndarray'>"
    return metric_name, metric_value, is_higher_better

dask_reg = DaskLGBMRegressor(
    client=client,
    objective="regression_l2",
    tree_learner="data"
)
dask_reg.fit(
    X=dX,
    y=dy,
    eval_set=[
        (dX, dy),
        (dX_e, dy_e)
    ],
    eval_metric = ["rmse", _custom_metric]
)

# confirm that custom metric function was actually called
# (should be an array of value 0.708)
print(dask_reg.evals_result_["valid_0"]["custom_metric"])

StrikerRUS · 2021-11-28T23:30:12Z

Ah, thank you very much! I was confused by the metrics in our tests:

LightGBM/tests/python_package_test/test_dask.py

Lines 222 to 236 in 2f5d898

    
           def _r2_score(dy_true, dy_pred): 
        
               numerator = ((dy_true - dy_pred) ** 2).sum(axis=0, dtype=np.float64) 
        
               denominator = ((dy_true - dy_true.mean(axis=0)) ** 2).sum(axis=0, dtype=np.float64) 
        
               return (1 - numerator / denominator).compute() 
        
           def _accuracy_score(dy_true, dy_pred): 
        
               return da.average(dy_true == dy_pred).compute() 
        
           def _constant_metric(dy_true, dy_pred): 
        
               metric_name = 'constant_metric' 
        
               value = 0.708 
        
               is_higher_better = False 
        
               return metric_name, value, is_higher_better

They all takes arguments named dy_true and dy_pred where d stays for "Dask", I guess.
I think it's better to remove d prefix from the arguments in _constant_metric() custom eval function. WDYT?

jameslamb · 2021-11-28T23:56:55Z

where d stays for "Dask", I guess.

ah yep, you're right! That is confusing. I'd support removing the d from the arguments in _constant_metric().

StrikerRUS · 2021-11-30T23:42:25Z

@jameslamb

ah yep, you're right! That is confusing. I'd support removing the d from the arguments in _constant_metric().

Great! I reverted docstring changes and pushed a fix for argument names.

github-actions · 2023-08-23T14:29:42Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

fix argument types in custom eval function for Dask estimators

c31f6e4

StrikerRUS added the fix label Nov 28, 2021

StrikerRUS requested review from chivee, henry0312, hzy46, jameslamb, jmoralez, shiyu1994 and tongwu-sh as code owners November 28, 2021 19:20

jameslamb requested changes Nov 28, 2021

View reviewed changes

StrikerRUS added 3 commits December 1, 2021 02:36

Merge branch 'master' into note_shape

ae8fedf

revert changes to docstrings

0396b99

fix argument names in Dask test

30a51cd

StrikerRUS added maintenance and removed fix labels Nov 30, 2021

StrikerRUS changed the title ~~[docs][dask] fix argument types in custom eval function for Dask estimators~~ [tests][dask] fix argument names in custom eval function in Dask test Nov 30, 2021

StrikerRUS requested a review from jameslamb November 30, 2021 23:42

jameslamb approved these changes Dec 1, 2021

View reviewed changes

StrikerRUS merged commit b31d5a4 into master Dec 2, 2021

StrikerRUS deleted the note_shape branch December 2, 2021 01:57

StrikerRUS mentioned this pull request Jan 6, 2022

[DO NOT MERGE] Release 3.3.2 #4930

Closed

13 tasks

jameslamb mentioned this pull request Oct 7, 2022

[DO NOT MERGE] Release v3.3.3 #5525

Closed

40 tasks

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests][dask] fix argument names in custom eval function in Dask test #4833

[tests][dask] fix argument names in custom eval function in Dask test #4833

StrikerRUS commented Nov 28, 2021

jameslamb left a comment

StrikerRUS commented Nov 28, 2021

jameslamb commented Nov 28, 2021

StrikerRUS commented Nov 30, 2021

github-actions bot commented Aug 23, 2023

	model.fit(
	data,
	label,
	sample_weight=weight,
	init_score=init_score,
	eval_set=local_eval_set,
	eval_sample_weight=local_eval_sample_weight,
	eval_init_score=local_eval_init_score,
	eval_names=local_eval_names,
	**kwargs
	)

[tests][dask] fix argument names in custom eval function in Dask test #4833

[tests][dask] fix argument names in custom eval function in Dask test #4833

Conversation

StrikerRUS commented Nov 28, 2021

jameslamb left a comment

Choose a reason for hiding this comment

StrikerRUS commented Nov 28, 2021

jameslamb commented Nov 28, 2021

StrikerRUS commented Nov 30, 2021

github-actions bot commented Aug 23, 2023