Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track GPU memory consumption explicitly #137

Closed
khuck opened this issue Mar 1, 2021 · 1 comment
Closed

Track GPU memory consumption explicitly #137

khuck opened this issue Mar 1, 2021 · 1 comment

Comments

@khuck
Copy link
Collaborator

khuck commented Mar 1, 2021

Currently, APEX tracks the cudaMalloc amounts, but doesn't track the total amount allocated. It relies on the periodic sampling of NVML counters, which can create blind spots. To avoid these blind spots, we should optionally track actually cudaMalloc and cudaFree locations and amounts. Each cudaMalloc call will increment an atomic counter of allocated memory bytes and insert into a map with the key as the address and the value the size. Then the cudaFree calls will use the address to look up the allocated size and decrement the atomic counter. This will be an optional feature, to avoid perturbation from contention for the map and the counter. Each malloc and free will result in an event to the OTF2 trace.

khuck added a commit that referenced this issue Mar 12, 2021
Now explicitly tracking all memory allocations and frees on both
the host and the device.
@khuck
Copy link
Collaborator Author

khuck commented Mar 12, 2021

Fixed with 7e37b10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant