Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible GPU memory usage increase. #5285

Closed
trivialfis opened this issue Feb 5, 2020 · 7 comments
Closed

Possible GPU memory usage increase. #5285

trivialfis opened this issue Feb 5, 2020 · 7 comments

Comments

@trivialfis
Copy link
Member

trivialfis commented Feb 5, 2020

It seems #5093 has some impact on GPU memory usage. After the commit I encountered an OOM error on mortgage with 4 Tesla V100 16GB. @rongou

My benchmark script: https://github.com/trivialfis/dxgb_bench .

@trivialfis
Copy link
Member Author

I will investigate it further later. Haven't confirmed yet.

@rongou
Copy link
Contributor

rongou commented Feb 5, 2020

Keep me posted. Are you using sampling anywhere?

@trivialfis
Copy link
Member Author

Nope. Just normal benchmarking.

@rongou
Copy link
Contributor

rongou commented Feb 5, 2020

I went through the PR again, I don't see any place allocating more GPU memory. In fact, we should use slightly less memory since the gradient pairs are no longer duplicated. Strange...

@trivialfis
Copy link
Member Author

Don't worry. I will look into it later. Will keep you posted.

@trivialfis
Copy link
Member Author

@RAMitchell I found the new device sketching uses more memory than the old one even after b745b7a.

So before the rewrite of device sketching running mortage 1 year with dask:

[20:36:08] ======== Device 0 Memory Allocations:  ========
[20:36:08] Peak memory usage: 9604MiB
[20:36:08] Number of allocations: 9399
[20:36:08] ======== Device 0 Memory Allocations:  ========
[20:36:08] Peak memory usage: 10217MiB
[20:36:08] Number of allocations: 9399

After:

[21:34:05] ======== Device 0 Memory Allocations:  ========
[21:34:05] Peak memory usage: 13377MiB
[21:34:05] Number of allocations: 9818
[21:34:05] ======== Device 0 Memory Allocations:  ========
[21:34:05] Peak memory usage: 15057MiB
[21:34:05] Number of allocations: 9818

@RAMitchell
Copy link
Member

Its designed to use up to 80% memory with going OOM. Using more memory can be a good thing as it runs faster. Its only a problem if it goes OOM right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants