Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical instability with auc_mu validation and large datasets #3201

Closed
thadeuluiz opened this issue Jul 2, 2020 · 3 comments · Fixed by #3209
Closed

Numerical instability with auc_mu validation and large datasets #3201

thadeuluiz opened this issue Jul 2, 2020 · 3 comments · Fixed by #3209
Assignees

Comments

@thadeuluiz
Copy link

How you are using LightGBM?

LightGBM component: Python package

Environment info

Operating System: Fedora 29

CPU/GPU model: Intel(R) Xeon(R) Gold 5120 CPU

C++ compiler version: g++ (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2)

CMake version: 3.14.5

Java version: OpenJDK Runtime Environment (build 1.8.0_232-b09)

Python version: Python 3.6.10 -- on conda

R version: N/A

Other:

LightGBM version or commit hash: 2.3.2 / 2e2757f

Error message and / or logs

When using auc_mu metric with a large dataset, validation score varies wildly outside the range [0, 1].
Reported validation metrics on cv were floating around -27, or 4.5 in some cases (atleast for my data).

Running the example described below, this is the first output:

[1] cv_agg's auc_mu: 14.9488 + 0.0245611

Reproducible example(s)

It only takes two lines:

data = lgb.Dataset(np.random.randn(3000000, 15), np.random.randint(0, 9, 3000000))
lgb.cv({'objective': 'multiclass', 'num_class': 9, 'metric': 'auc_mu'}, data, verbose_eval=True)

Interestingly, running the same code with 10x less data gives the expected result, giving 0.5 auc:

data = lgb.Dataset(np.random.randn(300000, 15), np.random.randint(0, 9, 300000))
lgb.cv({'objective': 'multiclass', 'num_class': 9, 'metric': 'auc_mu'}, data, verbose_eval=True)

yields [1] cv_agg's auc_mu: 0.499315 + 0.000975809

@StrikerRUS
Copy link
Collaborator

@btrotta Can you please take a look?

@btrotta
Copy link
Collaborator

btrotta commented Jul 5, 2020

@thadeuluiz Thanks for reporting this and providing a reproducible example. I'll look into it.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants