Produces a libtimemory-ncclp.so
that uses GOTCHA wrappers around ~12 NCCL functions.
Four functions are provided for C, C++, and Fortran:
uint64_t timemory_start_ncclp()
- Returns the number of initializations
uint64_t timemory_stop_ncclp(uint64_t idx)
- Removes the initialization request at
idx
- Returns the number of remaining initializations
- Removes the initialization request at
void timemory_register_ncclp()
- Ensures a global initialization exists until it deregistration
void timemory_deregister_ncclp()
- Deactivates the global initialization
The environement variable ENABLE_TIMEMORY_NCCLP
(default: "ON"
) controls configuration of the instrumentation.
This library configures the tim::user_ncclp_bundle
component with the components specified by the following environment variables in terms of priority:
TIMEMORY_NCCLP_COMPONENTS
TIMEMORY_MPIP_COMPONENTS
TIMEMORY_PROFILER_COMPONENTS
TIMEMORY_GLOBAL_COMPONENTS
TIMEMORY_COMPONENT_LIST_INIT
When one of the above environment variables are set to "none"
, then the priority search for component configurations is abandoned.
The following will result in NCCL function instrumented with cpu_clock
:
export TIMEMORY_NCCLP_COMPONENTS="cpu_clock"
export TIMEMORY_PROFILER_COMPONENTS="peak_rss"
export TIMEMORY_GLOBAL_COMPONENTS="wall_clock"
The following will result in NCCL functions containing no instrumentation:
export TIMEMORY_NCCLP_COMPONENTS="none"
export TIMEMORY_PROFILER_COMPONENTS="peak_rss"
export TIMEMORY_GLOBAL_COMPONENTS="wall_clock"
The following will result in NCCL function instrumented with wall_clock
and page_rss
:
export TIMEMORY_NCCLP_COMPONENTS=""
export TIMEMORY_GLOBAL_COMPONENTS="wall_clock,page_rss"