-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GraphBolt][CUDA] Inplace pin memory for Graph and TorchFeatureStore #6962
Conversation
To trigger regression tests:
|
@frozenbugs, I measured the memory consumption of the multi-GPU example in #6961. Without this PR, the consumption grows as more GPUs are used. With this PR, adding more GPUs does not significantly change the memory consumption. The tests pass as well. The multi-GPU example also seems to terminate cleanly. |
We saw better scale up performance one a single DGX node from 1GU to 8GPUs with this PR. |
Why does this PR lead to better performance? I thought it would only help lower the memory usage. |
Will find out if is just his PR or other PR combined effect. |
Description
torch pin_memory method creates a copy of the tensor. When we work with large datasets or use multi-GPU training, we don't want copies to be made. So, this PR ensures that
pin_memory_()
method is in-place by usingcudaHostRegister
.Checklist
Please feel free to remove inapplicable items for your PR.
Changes