Release Version 2.2.0 · UO-OACISS/apex

This release contains many updates and fixes. Of note is new support for CUDA/CUPTI events, and the ability to detect MPI applications even though HPX or APEX aren't configured with MPI support.

Changes:

view commit • Change to personal fork of concurrentqueue for stability
view commit • Cleaning up clang pedantic errors
view commit • Tweaking build system to support Windows
view commit • Merge pull request #122 from STEllAR-GROUP/fixing_windows_support
view commit • Adding annotation for process_profiles task
view commit • Cleaning up the dot/graphviz output
view commit • Adding "untied timers" option. With this option enabled, a profiler can be started on one OS thread and stopped on another. APEX won't keep track of the profiler stack.
view commit • Fixing unit conversion when writing out TAU profiles
view commit • Add capture of /proc/self/status Threads value
view commit • Capture the number of OS context switches
view commit • Cleaning up thread swap test
view commit • Adding additional error messages to PAPI component support
view commit • Debugging PAPI error checking
view commit • Updating to support binutils 2.34 API changes, adding pthread.h include header where needed
view commit • Updating deprecated HPX headers
view commit • First step in adding CUDA support Adding a CUDA example and adding CUDA/CUPTI headers through CMake.
view commit • Adding another cuda example
view commit • Working kernel measurement
view commit • Basic callback and activity support enabled
view commit • Done with initial implementation
view commit • Disable thread affinity for HPX configurations
view commit • Minor change to support running in MPI environment when MPI is not used by HPX or the APEX configuration. This happens when HPX is configured without a parcel port, and APEX thinks all ranks are 0. This change adds a check for MPI environment variables to validate the MPI rank that was passed in.
view commit • Adding MPI rank/size detection support for MPICH ...which also covers MVAPICH, Intel, Cray, etc. Also added some PBS/torque support, but unfortunately they don't provide an environment variable that specifies the total number of ranks. Maybe in the future we could have that be a special APEX environment variable that specifies the total number of ranks, if needed.
view commit • First step in adding CUDA support Adding a CUDA example and adding CUDA/CUPTI headers through CMake.
view commit • Adding another cuda example
view commit • Working kernel measurement
view commit • Basic callback and activity support enabled
view commit • Done with initial implementation
view commit • Merge branch 'cuda_support' of github.com:khuck/xpress-apex into cuda_support
view commit • Adding CUDA task dependency support
view commit • task dependency working! When GPU callbacks are made, we map the correlation ID to the task_wrapper associated with the parent. Then the GPU activity can be linked to the parent that launched it. also added two more examples.
view commit • Working CUDA support with task graphs and correct annotations This commit contains a nasty bug in task_identifier, where any identifier string gets "in place" modified when demangled. That can cause problems later when if map of said task_identifiers is modified. This will be merged to develop when the full support with tracing is merged.
view commit • Adding basic CUDA counters to the support for kernels and memory transfers.
view commit • Adding HPX config support for CUDA/CUPTI
view commit • Minor typo in HPX configuration
view commit • More changes for HPX support
view commit • Testing with cuda 10.1 and fixing config Testing with older cuda revealed that some installations are different.
view commit • Fixing bugs in shutdown. During shutdown, the asynchronous buffers were processed but the static strings that some labels depended on went out of scope. So the strings got corrupted. This is fixed by using const char * strings instead of const std::string&. Also, the counters are way too much overhead, so they are now optional.
view commit • Adding Google Chrome trace event support
view commit • Working (rudimentary) Google Trace Event support. This support only handles timers, no counters (yet).
view commit • Merge branch 'chrome_trace_event' into develop
view commit • Fixing implementation of public profile processing function to work with gcc 8
view commit • Minor change to add cudart to the link
view commit • Merge branch 'cuda_support' of https://github.com/khuck/xpress-apex into cuda_support
view commit • Minor changes to CUDA support and Google trace The Google trace support needs to be refactored, but otherwise this seems to be working.
view commit • Merge branch 'cuda_support' into develop
view commit • fixing time units in trace output
view commit • Cleaning up trace event output, making it more compact
view commit • Fixing Demangle/DEMANGLE inconsistency in CMake
view commit • Fixing DEMANGLE on a real computer
view commit • Fixing trace_event file creation for MPI runs
view commit • Fixing OTF2 clock to use new timestamps. Also updated GPU example to create many streams.
view commit • adding unified memory support. Needs to be initialized AFTER cuInit.
view commit • Merge branch 'cuda_support' into cuda_and_trace_event
view commit • Fixing context for unified memory events
view commit • Cleaning up CUDA support, removing dead code
view commit • forgot to add trace_event_listener.cpp to CMakeLists.hpx
view commit • Write trace events during shutdown, not destructor
view commit • Adding clock delta to account for difference between CPU and GPU clocks
view commit • CUPTI processing should ignore APEX non-worker threads
view commit • Cleanup parent assignement and finalization Always assign the parent, if it's available. And during finalization, don't do anything until after we have checked if APEX is disabled or not
view commit • Added get_num_workers() routine to get just the worker count.
view commit • Assign a thread ID for all threads The CUDA/CUPTI asynchronous processing thread needs to be able to generate GUIDS for asynchronous tasks. In order to generate GUIDs, the thread needs an id. So, always assign an ID. But only increment the number of workers if the new thread is actually a worker.
view commit • Adding unified memory counter support and fixing parent Adding two counters for page faults for unified memory. Also, when profilers for async events are created, pass in the task wrapper, which will call the right profiler constructor.
view commit • Minimizing trace output, adding guids as args Changing all timers to complete events to be compact. Writing GUID and parent GUID values for all timers. Added metadata tags for processes and threads so that they are sorted correctly.
view commit • removing debug message
view commit • Adding OTF2 support for CUDA! CUDA offloaded events are now supported in APEX when writing out to OTF2. Still to do - the stream "threads" need to be annotated as GPU threads, and given device/context/stream labels.
view commit • Cleaning up trace event stream names
view commit • Fixing thread labels for GPU and CPU threads
view commit • Fixing event unification at the end of OTF2 tracing When region names have spaces in them, the C++ istringstream parser will split them. Instead, just read the whole line into a string and split on the tab between the region ID and the name.
view commit • Adding support for cudaMalloc* bytes
view commit • Fixing race conditions in startup when PAPI NVML module starts making CUDA calls before APEX is ready to profile them.
view commit • Adding support for CUDA device API In addition to the CUDA runtime API, the device API can also be wrapped with callbacks. Use APEX_CUDA_DEVICE_API=1 with or without APEX_CUDA_RUNTIME_API=1 (enabled by default) to see the low level CUDA function calls.
view commit • Fixing MPI thread reduction for OTF2
view commit • Shortening test that is crashing unexpectedly
view commit • Removing apex assertions from profiler.hpp apex_assert.h doesn't get installed when building with HPX, so don't include it in profiler.hpp.
view commit • Fixing minor bugs, removing printf, and adding utilization per-core There's a new option, APEX_PROC_STAT_DETAILS that will show per-core (HW thread, really) utilization percentage. It's a total of all states minus idle, divided by total. Requested by DCA++ performance tests. Also fixing initialization in APEX, but not quite. I don't think we want to automatically call apex::init() and apex::finalize() as global constructors or destructors. But the option is still there.
view commit • Minor fix to prevent unification crash
view commit • removing files
view commit • Cleaning up CUPTI code and removing debug message from OTF2 listener

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 2.2.0