Numerical instability with auc_mu validation and large datasets #3201

thadeuluiz · 2020-07-02T06:05:17Z

How you are using LightGBM?

LightGBM component: Python package

Environment info

Operating System: Fedora 29

CPU/GPU model: Intel(R) Xeon(R) Gold 5120 CPU

C++ compiler version: g++ (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2)

CMake version: 3.14.5

Java version: OpenJDK Runtime Environment (build 1.8.0_232-b09)

Python version: Python 3.6.10 -- on conda

R version: N/A

Other:

LightGBM version or commit hash: 2.3.2 / 2e2757f

Error message and / or logs

When using auc_mu metric with a large dataset, validation score varies wildly outside the range [0, 1].
Reported validation metrics on cv were floating around -27, or 4.5 in some cases (atleast for my data).

Running the example described below, this is the first output:

[1] cv_agg's auc_mu: 14.9488 + 0.0245611

Reproducible example(s)

It only takes two lines:

data = lgb.Dataset(np.random.randn(3000000, 15), np.random.randint(0, 9, 3000000))
lgb.cv({'objective': 'multiclass', 'num_class': 9, 'metric': 'auc_mu'}, data, verbose_eval=True)

Interestingly, running the same code with 10x less data gives the expected result, giving 0.5 auc:

data = lgb.Dataset(np.random.randn(300000, 15), np.random.randint(0, 9, 300000))
lgb.cv({'objective': 'multiclass', 'num_class': 9, 'metric': 'auc_mu'}, data, verbose_eval=True)

yields [1] cv_agg's auc_mu: 0.499315 + 0.000975809

The text was updated successfully, but these errors were encountered:

StrikerRUS · 2020-07-02T13:12:36Z

@btrotta Can you please take a look?

btrotta · 2020-07-05T04:36:58Z

@thadeuluiz Thanks for reporting this and providing a reproducible example. I'll look into it.

github-actions · 2023-08-23T22:47:51Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

StrikerRUS assigned btrotta Jul 2, 2020

btrotta mentioned this issue Jul 5, 2020

Fix integer overflow in auc_mu (fixes #3201) #3209

Merged

guolinke closed this as completed in #3209 Jul 7, 2020

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerical instability with auc_mu validation and large datasets #3201

Numerical instability with auc_mu validation and large datasets #3201

thadeuluiz commented Jul 2, 2020

StrikerRUS commented Jul 2, 2020

btrotta commented Jul 5, 2020

github-actions bot commented Aug 23, 2023

Numerical instability with auc_mu validation and large datasets #3201

Numerical instability with auc_mu validation and large datasets #3201

Comments

thadeuluiz commented Jul 2, 2020

How you are using LightGBM?

Environment info

Error message and / or logs

Reproducible example(s)

StrikerRUS commented Jul 2, 2020

btrotta commented Jul 5, 2020

github-actions bot commented Aug 23, 2023