Slowdown with sparse data + bagging on versions 3+ #3637

fbwyx · 2020-12-09T20:08:23Z

How you are using LightGBM?

LightGBM component: Python package

Environment info

Operating System: macOS 10.14.6 (also see it on Ubuntu 18.04, but didn't compile the master branch there)

CPU/GPU model: x86-64/No GPU

C++ compiler version:

$ clang -v
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Java version: None

CMake version:

$ cmake --version
cmake version 3.19.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

Python version:

$ python -V
Python 3.7.9

R version: None

LightGBM version or commit hash: 3.0, 3.1, and 44a6fb7ffa646b469fc10475b3526c61239682ac (latest master as of writing this)

Error message and / or logs

None

Reproducible example(s)

Running this script:

# bagging.py
import lightgbm as lgb
import numpy as np
from scipy import sparse

print(lgb.__version__)

x = sparse.rand(5_000_000, 10, density=0.25, dtype=np.float32, format="csr", random_state=123)
y = np.ravel(x.sum(axis=1) > 1)

classifier = lgb.LGBMClassifier(
    n_estimators=100,
    objective="binary",
    num_threads=4,
    bagging_freq=1,
    bagging_fraction=0.5,
    seed=123,
)
classifier.fit(x, y)

on 2.3.1 gives:

$ time python bagging.py
2.3.1
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=1.0 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_freq is set=1, subsample_freq=0 will be ignored. Current value: bagging_freq=1
[LightGBM] [Warning] num_threads is set=4, n_jobs=-1 will be ignored. Current value: num_threads=4

real    0m27.869s
user    1m30.511s
sys     0m0.977s

but on master gives:

$ time python bagging.py
3.1.1.99
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=1.0 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_freq is set=1, subsample_freq=0 will be ignored. Current value: bagging_freq=1
[LightGBM] [Warning] num_threads is set=4, n_jobs=-1 will be ignored. Current value: num_threads=4

real    0m55.539s
user    3m12.913s
sys     0m2.495s

We see similar slowdowns on our production data; I'll note that on 2.3.1 LightGBM easily saturates all cores, but on 3.1 it appears to get stuck using 1-1.5 cores.

The text was updated successfully, but these errors were encountered:

guolinke · 2020-12-10T02:09:03Z

@shiyu1994 can you help to check the efficiency of bagging?

shiyu1994 · 2021-01-04T07:05:03Z

The synthesized dataset has only 1 multi-value feature group, which results in single thread execution in the Dataset::CopySubrow method (with num_groups_ == 1).

LightGBM/src/io/dataset.cpp

Lines 789 to 801 in d1014ea

    
           void Dataset::CopySubrow(const Dataset* fullset, 
        
                                    const data_size_t* used_indices, 
        
                                    data_size_t num_used_indices, bool need_meta_data) { 
        
             CHECK_EQ(num_used_indices, num_data_); 
        
             OMP_INIT_EX(); 
        
           #pragma omp parallel for schedule(static) 
        
             for (int group = 0; group < num_groups_; ++group) { 
        
               OMP_LOOP_EX_BEGIN(); 
        
               feature_groups_[group]->CopySubrow(fullset->feature_groups_[group].get(), 
        
                                                  used_indices, num_used_indices); 
        
               OMP_LOOP_EX_END(); 
        
             } 
        
             OMP_THROW_EX();

PR #3720 should fix this.

StrikerRUS · 2021-03-27T21:49:16Z

@shiyu1994 Can we close this issue?

fbwyx · 2021-03-30T21:02:00Z

Confirmed that the regression is gone on our real dataset. In fact we got quite a nice speedup! 😄 Thanks for the fix!

StrikerRUS · 2021-03-30T21:13:24Z

Really nice to hear that!

github-actions · 2023-08-23T14:46:20Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

shiyu1994 self-assigned this Dec 10, 2020

shiyu1994 mentioned this issue Jan 4, 2021

Change Dataset::CopySubrow from group-wise to column-wise #3720

Merged

StrikerRUS mentioned this issue Jan 28, 2021

v3.2.0 release #3872

Merged

fbwyx closed this as completed Mar 30, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slowdown with sparse data + bagging on versions 3+ #3637

Slowdown with sparse data + bagging on versions 3+ #3637

fbwyx commented Dec 9, 2020

guolinke commented Dec 10, 2020

shiyu1994 commented Jan 4, 2021

StrikerRUS commented Mar 27, 2021

fbwyx commented Mar 30, 2021

StrikerRUS commented Mar 30, 2021

github-actions bot commented Aug 23, 2023

Slowdown with sparse data + bagging on versions 3+ #3637

Slowdown with sparse data + bagging on versions 3+ #3637

Comments

fbwyx commented Dec 9, 2020

How you are using LightGBM?

Environment info

Error message and / or logs

Reproducible example(s)

guolinke commented Dec 10, 2020

shiyu1994 commented Jan 4, 2021

StrikerRUS commented Mar 27, 2021

fbwyx commented Mar 30, 2021

StrikerRUS commented Mar 30, 2021

github-actions bot commented Aug 23, 2023