Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Dataset::CopySubrow from group-wise to column-wise #3720

Merged
merged 1 commit into from
Jan 25, 2021

Conversation

shiyu1994
Copy link
Collaborator

This is to fix the efficiency problem reported by #3637. When bagging is used, Dataset::CopySubrow multi-threads by feature groups, which is not quite reasonable when a large multi-value feature group exists.

This PR changes the group-wise multi-threading into column-wise multi-threading.

@guolinke
Copy link
Collaborator

guolinke commented Jan 7, 2021

any benchmark numbers?

@shiyu1994
Copy link
Collaborator Author

Set bagging_fraction=0.4 and bagging_freq=1, and test on the large datasets. This PR speedups the training significantly in MS LTR and Allstate. No obvious effect is observed in other datasets. The metric values are identical with the master, except for MS LTR. I think this PR is ready for merge. But I think we need look into why MS LTR produces 4 different results.

Dataset row-wise-master time row-wise-this-pr time row-wise speedup col-wise-master time col-wise-this-pr time col-wise speedup
higgs 124.89±0.50 124.96±2.16 1.00 152.19±0.69 151.04±0.69 1.01
yahoo 54.41±0.36 54.32±0.55 1.00 62.78±0.75 62.21±0.35 1.01
msltr 190.29±0.85 106.66±0.68 1.78 202.39±4.46 117.24±0.70 1.73
dataexpo_onehot 103.52±0.43 103.70±0.67 1.00 123.75±5.09 120.02±1.57 1.03
allstate 276.00±2.57 204.78±0.51 1.35 307.40±3.08 234.90±2.20 1.31
year 37.49±0.59 37.21±0.50 1.01 17.10±0.22 17.05±0.24 1.00
Dataset row-wise-master metric row-wise-this-pr metric col-wise-master metric col-wise-this-pr metric
higgs 0.847046 0.847046 0.847046 0.847046
yahoo 0.746130 0.746130 0.746130 0.746130
msltr 0.501659 0.501729 0.501306 0.501659
dataexpo_onehot 0.774887 0.774887 0.774887 0.774887
allstate 0.609139 0.609139 0.609139 0.609139
year 9.010180 9.010180 9.010180 9.010180

@guolinke guolinke merged commit 3653167 into microsoft:master Jan 25, 2021
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants