Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-introduce double buffer in UpdatePosition, to fix perf regression in gpu_hist #6757

Merged
merged 3 commits into from
Mar 18, 2021

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Mar 17, 2021

Closes #6552 by partially reverting the commit f779980

Comment on lines -162 to -163
dh::TemporaryArray<bst_node_t> position_temp(position_a_.size());
dh::TemporaryArray<RowIndexT> ridx_temp(ridx_a_.size());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect building TemporaryArray at every invocation of UpdatePosition gets expensive in the particular example provided.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share some benchmark results?

@hcho3
Copy link
Collaborator Author

hcho3 commented Mar 17, 2021

@trivialfis The posted example in #6552 won't complete in a reasonable amount of time with the latest master (and 1.3.0). I can leave it up over night and see if it finishes by tomorrow.

On the other hand, here's the result with the proposed fix:

[0]     convergence-merror:0.10066
[1]     convergence-merror:0.10053
[2]     convergence-merror:0.09998
[3]     convergence-merror:0.09928
[4]     convergence-merror:0.09851
[5]     convergence-merror:0.09770
[6]     convergence-merror:0.09689
[7]     convergence-merror:0.09601
[8]     convergence-merror:0.09517
[9]     convergence-merror:0.09438
Time elapsed = 673.7375219200039 s

@trivialfis
Copy link
Member

trivialfis commented Mar 17, 2021

Thanks for sharing, what about more conventional dataset?

@trivialfis
Copy link
Member

I can leave it up over night and see if it finishes by tomorrow.

Nah, no need. I'm just worrying about there might be regression on other types of data.

@hcho3
Copy link
Collaborator Author

hcho3 commented Mar 17, 2021

what about more conventional dataset?

Any suggestion? Should I try gbm-bench?

src/tree/gpu_hist/row_partitioner.cu Outdated Show resolved Hide resolved
src/tree/gpu_hist/row_partitioner.cuh Outdated Show resolved Hide resolved
@trivialfis
Copy link
Member

Any suggestion? Should I try gbm-bench?

That would be great!

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as long as there's no regression on other datasets.

@hcho3
Copy link
Collaborator Author

hcho3 commented Mar 18, 2021

@trivialfis Here is the benchmark results on gbm-bench. I do not see any performance degradation:

Runtime (s) Before After
airline 40.44 40.57
bosch 2.77 2.77
covtype 4.37 4.38
epsilon 16.12 16.19
fraud 0.44 0.41
higgs 6.50 6.53
year 0.75 0.72

@hcho3 hcho3 merged commit 4230dcb into dmlc:master Mar 18, 2021
@hcho3 hcho3 deleted the use_double_buffer branch March 18, 2021 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Training behaviour difference between v1.1.0 and v1.3.1
3 participants