Skip to content

Releases: UO-OACISS/apex

Release v2.6.5 with bug fixes

22 Feb 17:53
Compare
Choose a tag to compare
  • view commit • Cleaning up kokkos tuning verbosity, shortening simulated annealing minimum iterations.
  • view commit • Adding common tree construction at end of MPI execution. This will speed up analysis with python and the apex-treesummary.py script.
  • view commit • Rocm 5.7.0 is missing librocprofiler64.so, so don't expect it to be there. But don't fail if it isn't.
  • view commit • Adding explicit rocm_smi memory check
  • view commit • Adding extra calls to query HIP memory periodically, and to query SMI memory at alloc/free points.
  • view commit • Make sure defines are defined correctly.
  • view commit • Don't include rocm_smi in ompt code.
  • view commit • Cleaning up some race condition crashes during short tests
  • view commit • Removing unused variable
  • view commit • Fixing bug where tasktree header isn't written to csv file for non-MPI runs
  • view commit • Reporting tcmalloc preload error when detected. We could probably automatically detect that tcmalloc is a dependency and preload it automatically...
  • view commit • Adding concurrency options to apex_exec Also allowing all options to have either underscores or dashes
  • view commit • Disabling APEX_BUILD_OMPT, this addresses PR #177 with the correct fix. There's no need to build the LLVM runtime to support the non-compliant GCC compiler.
  • view commit • Removing OMPT build setting from CI
  • view commit • Adding Kokkos unit test, but only enable it if APEX builds Kokkos as a submodule. We can't guarantee that the installed Kokkos will only provide host support. This allows us to test Kokkos support on CI.
  • view commit • Fixing tree post-processing script to assume that the task tree is common across all ranks, this speeds up processing a bunch.
  • view commit • Adding warning that OMPT got re-initialized
  • view commit • Fixing startup issues when launched with gdb, or just in general
  • view commit • Enabled headless batch GDB processing to catch crashes with mpirun.
  • view commit • Fixing bug in computing receive bytes for non-root ranks and getting a segv because the pointer is null
  • view commit • Fixing a bug in OMPT support where openmp regions happen after finalize - the kokkos runtime uses openmp regions in destructors.
  • view commit • Fixing symbol collision between ompt and hip support
  • view commit • Resetting the static tree node count after each dump, needed for merging common tree of tasktree data.
  • view commit • Only have a threadpool the size of the allowable cores, not all of them
  • view commit • Fixing shutdown bug when finalize is called without dump on frontier
  • view commit • Updating readme
  • view commit • Updating documentation
  • view commit • Updating CI build options
  • view commit • adding documentation link
  • view commit • adding documentation link
  • view commit • Debugging Kokkos autotuning of occupancy hints on CUDA. Lots of silly mistakes.
  • view commit • Forgot to add source files for test
  • view commit • Forgot another test source file
  • view commit • Force Kokkos fencing enabled when doing autotuning.
  • view commit • Flushing trace at dump.
  • view commit • Fixing bug in trace buffer flush at exit, and reducing minimum iterations for simulated annealing
  • view commit • Updating documentation for 2.6.4 release
  • view commit • Updating documentation for 2.6.4 release
  • view commit • Changing default of Kokkos tuning to false
  • view commit • Updating binutils support to work with modern compilers
  • view commit • Fixing binutils hash
  • view commit • Lots of fixes for tracking memory leaks Found a few issues with memory tracking on Frontier. These fixes will allow us to delay memory tracking until after apex::dump() has been called some number of times (configurable), and fixes some symbol resolution. This also fixes some trace output for when we crash before exit.
  • view commit • Updating version number for 2.6.5 patch release
  • view commit • Merge branch 'develop'
  • Relase v2.6.4 with bug fixes

    07 Feb 00:26
    Compare
    Choose a tag to compare

    New release for v2.6.4, providing bug fixes.

  • view commit • Removing Kokkos header and making kokkos a build dependency, either externally if available or as a git submodule.
  • view commit • Adding Kokkos submodule support for HPX integrated build.
  • view commit • Cleaning up Kokkos submodule support for HPX integrated build.
  • view commit • Fixing compiler warning by checking return value.
  • view commit • Fixing CMake capitalization for Kokkos
  • view commit • Fix deprecated CMake cache variables
  • view commit • Cleaning up kokkos tuning verbosity, shortening simulated annealing minimum iterations.
  • view commit • Adding common tree construction at end of MPI execution. This will speed up analysis with python and the apex-treesummary.py script.
  • view commit • Rocm 5.7.0 is missing librocprofiler64.so, so don't expect it to be there. But don't fail if it isn't.
  • view commit • Adding explicit rocm_smi memory check
  • view commit • Adding extra calls to query HIP memory periodically, and to query SMI memory at alloc/free points.
  • view commit • Make sure defines are defined correctly.
  • view commit • Don't include rocm_smi in ompt code.
  • view commit • Cleaning up some race condition crashes during short tests
  • view commit • Removing unused variable
  • view commit • Fixing bug where tasktree header isn't written to csv file for non-MPI runs
  • view commit • Reporting tcmalloc preload error when detected. We could probably automatically detect that tcmalloc is a dependency and preload it automatically...
  • view commit • Adding concurrency options to apex_exec Also allowing all options to have either underscores or dashes
  • view commit • Disabling APEX_BUILD_OMPT, this addresses PR #177 with the correct fix. There's no need to build the LLVM runtime to support the non-compliant GCC compiler.
  • view commit • Removing OMPT build setting from CI
  • view commit • Merge branch 'develop' into fix-cmake-deprecated-variables
  • view commit • Merge pull request #178 from Pansysk75/fix-cmake-deprecated-variables
  • view commit • Fixing tree post-processing script to assume that the task tree is common across all ranks, this speeds up processing a bunch.
  • view commit • Adding warning that OMPT got re-initialized
  • view commit • Fixing startup issues when launched with gdb, or just in general
  • view commit • Enabled headless batch GDB processing to catch crashes with mpirun.
  • view commit • Fixing bug in computing receive bytes for non-root ranks and getting a segv because the pointer is null
  • view commit • Fixing a bug in OMPT support where openmp regions happen after finalize - the kokkos runtime uses openmp regions in destructors.
  • view commit • Fixing symbol collision between ompt and hip support
  • view commit • Resetting the static tree node count after each dump, needed for merging common tree of tasktree data.
  • view commit • Only have a threadpool the size of the allowable cores, not all of them
  • view commit • Fixing shutdown bug when finalize is called without dump on frontier
  • view commit • Updating readme
  • view commit • Updating documentation
  • view commit • Updating CI build options
  • view commit • Adding Kokkos unit test, but only enable it if APEX builds Kokkos as a submodule. We can't guarantee that the installed Kokkos will only provide host support. This allows us to test Kokkos support on CI.
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • adding documentation link
  • view commit • adding documentation link
  • view commit • Debugging Kokkos autotuning of occupancy hints on CUDA. Lots of silly mistakes.
  • view commit • Forgot to add source files for test
  • view commit • Forgot another test source file
  • view commit • Force Kokkos fencing enabled when doing autotuning.
  • view commit • Flushing trace at dump.
  • view commit • Fixing bug in trace buffer flush at exit, and reducing minimum iterations for simulated annealing
  • view commit • Updating documentation for 2.6.4 release
  • view commit • Updating documentation for 2.6.4 release
  • Relase v2.6.3 with bug fixes

    12 Jul 15:25
    Compare
    Choose a tag to compare
  • view commit • Fixing typo in usage message for apex_exec
  • view commit • Add .circleci/config.yml
  • view commit • Update config.yml
  • view commit • Updated config.yml
  • view commit • Updated config.yml
  • view commit • Updated config.yml
  • view commit • Updated config.yml
  • view commit • Updated config.yml
  • view commit • Updated config.yml
  • view commit • Updated config.yml
  • view commit • Updated config.yml
  • view commit • Updated config.yml
  • view commit • Merge pull request #174 from UO-OACISS/circleci-project-setup
  • view commit • Update README.md
  • view commit • fixed return before error message
  • view commit • Check return value from getline
  • view commit • Adding throw attributes to malloc, calloc, free, realloc
  • view commit • Fixing compiler warnings
  • view commit • Removing throw attributes from malloc, free, realloc, calloc
  • view commit • Adding return code check for system call
  • view commit • Updating readme
  • view commit • Adding sleep so that the very short tests don't crash when finalizing during the startup of the proc_read thread
  • view commit • Fixing bug in apex-summary.py script for default behavior
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • fixing bug in apex-summary.py, adding regular expressions for --drop and --keep flags in the apex-treesummary.py
  • view commit • Adding additional test output on failures
  • view commit • Adding --verbose option, and fixing a bug in the --keep processing to prevent double-adding the root node
  • view commit • Allowing for multiple root node pdfs to be written by apex-treesummary.py
  • view commit • Debugging summary scripts
  • view commit • Re-enabling older json hatchet output
  • view commit • Making sure flow events are unique across processes.
  • view commit • Adding PAPI output for CSV tasktree output
  • view commit • Make sure that ompt target callback data is added to maps before the asynchronous buffer is processed - some events may end up in the buffer before the synchronous event has finished.
  • view commit • Updating location of ActiveHarmony download
  • view commit • Fixing OMPT support for older implementations
  • view commit • Removing unused variable
  • view commit • Fixing stop after finalization error and issue with 0 length OMPT timers
  • view commit • Fixing CMake issues with IntelLLVM, memory wrapper output file naming and not ld_preloading when calling env during apex_exec.
  • view commit • Fixing compiler warnings from nvhpc on Polaris
  • view commit • Fixing simulated annealing minimum iterations
  • view commit • Updating version
  • APEX Version 2.6.2

    21 Mar 18:03
    Compare
    Choose a tag to compare

    Several change to the APEX code structure, including separation of HIP, CUDA, OMPT, LEVEL0, PERFETTO and MPI support to separate libraries. This allows us to have one configure/build on a system, and allow it to dynamically add support at runtime to support different features. This allows us to have a configuration that provides support for CUDA but doesn't require it, for example. Other new features include:

    • Level0 (SYCL/OneAPI support)
    • Improved OMPT support for supported compilers
    • Phiprof support (VLASIATOR)
    • StarPU support
    • Perfetto native tracing support (doesn't yet support flow events, JSON trace output still recommended)
    • Merged CSV output for both flat profile and task tree data
    • Python scripts (apex-summary.py, apex-treesummary.py) to post-process merged CSV output
    • Many bug fixes
  • view commit • Initial Level 0 support. Profiling works, but tracing timestamps are quite bogus.
  • view commit • Updating OneAPI support, still examining timestamps
  • view commit • Fixing APPLE pedantic build issues.
  • view commit • Adding native perfetto tracing support, very rudimentary support.
  • view commit • Working with correct timestamps now
  • view commit • Adding program name to program track in perfetto
  • view commit • Adding async events
  • view commit • Adding async events
  • view commit • Adding --apex:pftrace option to apex_exec for Perfetto and deprecating the Google Trace Events option
  • view commit • Debugging async events with HIP. Apparently the nvhpc 22.11 compiler crashes when compiling the perfetto.cc file.
  • view commit • Debugging perfetto support with nvhpc
  • view commit • Debugging hpx build
  • view commit • Adding perfetto license and readme
  • view commit • Debugging perfetto trace with hpx - make sure the trace is closed on exit
  • view commit • Making Perfetto native support optional, with -DAPEX_WITH_PERFETTO flag
  • view commit • Lots of fixes for starpu and pthreads. All detached threads are now correctly handled at exit, and starpu counters are back.
  • view commit • Debugging missing stop/starts from threads that are spawned by blas, cuda, starpu, etc.
  • view commit • Fixing pthread_create wrapper support for library threads launched before main. Also changing order that timers are written to taskgraph.0.dot, becuase when APEX MAIN, apex_preload_main, and APEX pthread wrapper timers aren't written first, they get lost in the graph. we want the graph anchored at those timers.
  • view commit • Timer throttling is now a runtime option, disabled by default.
  • view commit • Phiprof support
  • view commit • Adding timer throttle options to apex_exec
  • view commit • Renamed config file, removed a compilation message
  • view commit • Merge branch 'develop' into oneapi
  • view commit • Fixing links to HPX web sites
  • view commit • Debugging Level0 support and adding inclusive time for task lifetimes
  • view commit • Finally fixed the intel timestamps for tracing.
  • view commit • Making perfetto off by default.
  • view commit • Debugging on Crusher
  • view commit • Replacing broken CSV output with reduced CSV output from all ranks.
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Fully reduced CSV and tasktree output now. The tasktree output can be post-processed to generate any necessary Hatchet, Graphviz or Trilinos output if desired.
  • view commit • Adding general metrics for tasktree nodes!
  • view commit • Removing debug message
  • view commit • Adding ability to check whether MPI can accomodate the memory requested for big transfers. Use the APEX_VALIDATE_MPI_MEMORY_USAGE variable to enable it.
  • view commit • More advanced statistics for MPI bytes in the tasktree data, including min, max, mode, median, stddev
  • view commit • Removing BW computation during run, what's the point of adding overhead?
  • view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
  • view commit • Fixing non-portable code changes
  • view commit • Fixing portability bug without HIP/ROCm
  • view commit • Fixing sorting of csv tasktree output, I think
  • view commit • Adding reduction support for HPX+MPI. Still need support for other parcels.
  • view commit • Adding inclusive time to tasktree output.
  • view commit • Fixing units in tasktree output
  • view commit • Forgot to write out yields to profile csv
  • view commit • Fixing sorting errors in tasktree csv output
  • view commit • Adding script to post-process apex_profiles.csv and provide same output as what we get at the end of the run, but with greater flexibility.
  • view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • <...
    Read more

    Version v2.6.1

    27 Dec 21:28
    Compare
    Choose a tag to compare

    Bug fixes after 2.6 release. Details:

  • view commit • StarPU support
  • view commit • Fixed source file inclusion
  • view commit • Fixing bugs on osx, formatting screen output to fit in 130 characters
  • view commit • Fixing formatting for screen
  • view commit • Updating to Rocm 5.4.0 and fixing compiler warnings/errors
  • view commit • Adding random search, and adding output from exhaustive and random search strategies for Kokkos autotuning.
  • view commit • Updating random tuning search, and adding some evaluation for simulated annealing search.
  • view commit • Updating StarPU support
  • view commit • Merge remote-tracking branch 'uo/develop' into develop
  • view commit • Debugging starpu support
  • view commit • Merge pull request #167 from coti/develop
  • view commit • Replacing hpx::apply with hpx::post to avoid deprecation warnings
  • view commit • Merge pull request #168 from STEllAR-GROUP/hpx_warning
  • view commit • Debugging on apple, adding option to sort screen timer names alphabetically (with APEX_SORT_TIMERS_BY_NAME)
  • view commit • Merge branch 'thread_stats' into develop
  • view commit • Pedantic debugging
  • view commit • Updating version number
  • view commit • Updating version number
  • Version 2.6.0

    06 Dec 20:53
    Compare
    Choose a tag to compare

    Lots of bug fixes and new features.
    Highlights:

    • CUDA support updated through nvhpc 22.9 / cuda 11.9
    • HIP support updated through rocm 5.2.0
    • OMPT target offload support tested with amdclang 5.2.0, intel, nvhpc compilers

    The change log:

  • view commit • Working on OMPT update with target offload on MI250X with AMD 5.1.0 compilers
  • view commit • Minor changes and bug fixes from testing on crusher. Kokkos doesn't include the device ID any more (assume 0). When accumulated is 0.0, assume the timer hasn't been stopped.
  • view commit • Merge branch 'ompt_amd_target_update' of git.nic.uoregon.edu:/gitroot/xpress-apex into ompt_amd_target_update
  • view commit • Working OpenMP target support!
  • view commit • Fixing bugs in OMPT offload tracing, task dependencies, flow events. All seems working?
  • view commit • Working OpenMP Target Offload with tracing to GTrace and OTF2
  • view commit • Adding sync time measurement to collectives.
  • view commit • Debugging OMPT target offload with latest Intel OneAPI
  • view commit • Merge branch 'ompt_amd_target_update' into develop
  • view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
  • view commit • Adding comment for AMD implementation of OMPT
  • view commit • Check to make sure a profiler is returned when measuring bandwidth for MPI. This can happen when timers are requested after APEX has shut down.
  • view commit • Updating JSON tasktree output to support Hatchet analysis
  • view commit • Fixing tasktree output for Hatchet parsing.
  • view commit • executable requires -lstdc++ when building with icx
  • view commit • Found bug in destructor logic. Default thread id to INT_MAX.
  • view commit • Bug when configuring OMPT but not OTF2
  • view commit • Fixing OpenMP event annotation, including address whether the code was compiled with debug or not. Demangling now happens in apex_bfd.cpp, before line number or address is added to the symbol name.
  • view commit • Merge branch 'oneapi' into develop
  • view commit • Removing debug statement
  • view commit • Removing debug statement
  • view commit • More threaded statistic support
  • view commit • Testing with ROCm 5.2
  • view commit • Debugging latest PAPI hip/rocm counter support on crusher, fixing bugs with initializing the event set with the rocm component. With 5.2, all hardware counters should work.
  • view commit • Fixing csv output to be consistent across counters and timers.
  • view commit • Cleaning up compiler warning message
  • view commit • Adding support for counter groups, causes re-execution of the application and profile data is written to different directories for each pass.
  • view commit • Cleaning up papi counter groups, now also tracing the metrics to trace events.
  • view commit • Splitting GPU and CPU memory allocation leak tracking.
  • view commit • Debugging memory tracking for gpus and cpus
  • view commit • Configuring apex to install apex_exec when configured as part of HPX
  • view commit • Debugging support on polaris.
  • view commit • Adding apex_environment_help utility to list all APEX environment variables.
  • view commit • Cleaning up address resolution and providing a backtrace_symbols backup implementation when BFD not used
  • view commit • Adding new utility to dump all environment variables
  • view commit • Adding banner at program start to confirm things are working.
  • view commit • Adding Fortran MPI Wrapper support Added wrappers for all the C MPI functions that we care about. Also debugged the "delayed start" feature - will need some more testing with CUDA and OneAPI.
  • view commit • When started disabled, make sure CUDA returns immediately for callbacks and activity
  • view commit • Cleaning up build with hipcc & mpi & hpx
  • view commit • Debugging HPX+Kokkos
  • view commit • Kokkos renamed their tool library environment variable.
  • view commit • Trying to fix the Rocprofiler library rpath problem.
  • view commit • Only include apex_mpi.cpp if the MPI parcel port is used.
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Updating namespace for HPX bind call
  • view commit • Debugging HPX exit and replacing __func__ with pretty version where available.
  • view commit • Cleaning up output
  • view commit • Fixing bug where csv output doesn't happen without screen output
  • view commit • Updating Cray power counter support on crusher
  • view commit • HPX doesn't like the MPI wrapper on some systems. Disable it for now.
  • <...
  • Read more

    v2.5.1

    25 Apr 22:53
    Compare
    Choose a tag to compare
  • view commit • The HPX configuration was missing ROCPROFILER
  • view commit • Adding ROC profiler sources for HPX when HIP support is requested
  • view commit • Cleaning up HIP enabled builds with GCC as compiler
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Fixing apparently broken Range Begin/End Push/Pop Cupti hasn't been making callbacks for range end or pop. So we will wrap those functions, instead of processing callbacks
  • view commit • Starting fix on range end events
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Updating version number
  • APEX Version 2.5.0

    11 Apr 20:50
    Compare
    Choose a tag to compare
  • view commit • Trying to integrate OpenMP target offload for AMD and NVIDIA
  • view commit • Adding clock synchronization code, but not enabled yet.
  • view commit • Working MPI clock sync for OTF2. Still need to test and fully implment the HPX version. In this implementation, all ranks determine a latency between rank 0 and themselves, then compute a clock drift between the two. That drift is added to the archive file creation time as an absolute reference that all ranks can use.
  • view commit • Updating HPX version of clock sync before test with HPX build
  • view commit • Updated clock sync for OTF2 trace output for HPX. This doesn't actually do a clock sync, but uses the OTF2 archive creation time as the "baseline" timestamp for all localities. Unfortunately, I need to figure out a safe time to make lco requests from one rank to another - during the apex::initialize phase, HPX isn't quite ready to handle requests yet.
  • view commit • Fixing initial values for kokkos tuning search
  • view commit • Minor fix to prevent crash when resolving long Kokkos kernel names
  • view commit • Fixing initial value for interval set parameters. When tuning Kokkos kernels, we need to use the initial value of the variable, which is (supposed to be) given to us by Kokkos.
  • view commit • Cleaning up kokkos autotuning code
  • view commit • First commit for transferred/renamed repo
  • view commit • Allow for unlimited custom policies Needed for Kokkos autotuning, allow as many tuning sessions as the system requests. Also reduce the output to the screen.
  • view commit • Fixing stat structure for OSX
  • view commit • Adding delayed start and max measurement Adding APEX_START_DELAY_SECONDS option and APEX_MAX_DURATION_SECONDS to specify a delayed start of measurement, and a max length to measure. The max length does not include delayed time. So a delay of 1 seconds and a max length of 2 seconds would record seconds 1-3 of an execution.
  • view commit • hipcc doesn't defined the _OPENMP macro, so don't expect it.
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Cleaning up build as we port to spock Need to clean up installation location of dependent packages to allow for re-building without deleting the install directory.
  • view commit • updating documentation links
  • view commit • Updating the readme with hip and pthread info, as well as reference links
  • view commit • adding implementation of `kokkosp_request_tool_settings` so we can disable fencing when profiling
  • view commit • Updating Kokkos tuning to support multiple sessions at the same time, of unlimited number. Also tweaked the simulated annealing search to converge in a reasonable time frame.
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Adding HSA events to the hip tracer
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Merge branch 'master' into develop
  • view commit • Adding OpenMP Target examples
  • view commit • Pushing for David to test
  • view commit • Working Kokkos tuning with caching
  • view commit • Fixing compiler bug that slipped through
  • view commit • Fixing code path for synchronous timer processing Even when processing profiles synchronously, we were launching the background thread and signalling it when a profiler was created.
  • view commit • Don't use shared pointers when doing synchronous processing
  • view commit • lowering overhead in measurement. Don't enable hip or cuda by default, and provide hip command line options for apex_exec.
  • view commit • Fixing compiler error that shouldn't have been committed?
  • view commit • Fixing all dependency builds
  • view commit • Removing google profiler from apex_exec
  • view commit • Adding active harmony patch
  • view commit • Cleaning up config for CUDA and GCC and openmp offload
  • view commit • Fixing namespace for apex options
  • view commit • Adding output for gtrace output
  • view commit • cd to the output directory before post-processing
  • view commit • Cmake changes for subdir builds
  • view commit • Fixing kokkos tuning cache writing logic, and stopped generating overhead once the tuning is done.
  • view commit • Merge branch 'feature/allow-subdir-builds' of github.com:DavidPoliakoff/apex into develop
  • view commit • Fixing CMake variables to allow for subproject builds
  • view commit • Fixing compiler error with gcc 5.4
  • view commit • Need a minimum value for 'quarter' which is used in the calculation of the next candidate neighbor to test. If quarter is less than 2, we never get movement.
  • view commit • Adding ROCm SMI reader
  • view commit • Fixing bfd install step mistake
  • Read more

    Patch release v2.4.1

    15 Jun 22:53
    64c4f4c
    Compare
    Choose a tag to compare

    Emergency patch to fix HPX collectives API change in next HPX release.

    APEX Release v2.4.0

    28 May 22:53
    Compare
    Choose a tag to compare

    This is an update to APEX, with several new features including:

    • New simulated annealing search for policies
    • New Kokkos kernel autotuning support
    • Memory leak detection (experimental)
    • Updated scatterplot support, including counters and updated Python scripts to use python3
    • HIP/ROCm Roctracer support

    Full list of commits:

  • view commit • Don't enable examples by default
  • view commit • Kokkos doesn't like it if you replace the OpenMP library at runtime. So OMPT support now has to be explicitly enabled by --apex:ompt to preload the OpenMP runtime library (if desired).
  • view commit • Adding kokkos tuning support. Needs work.
  • view commit • Kokkos tuning working, but AH not getting right answer.
  • view commit • Working, but AH still stuck in local minima.
  • view commit • Adding/fixing PBS and SLURM variables
  • view commit • Fixing build error without kokkos autotuning
  • view commit • Trying to improve convergence for kokkos autotuning
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Debugging Kokkos tuning issues
  • view commit • Adding Kokkos tooling header, eliminates need to require Kokkos as a dependency
  • view commit • Adding quotes around path to harmony home
  • view commit • Working Kokkos autotuner. This uses a Nelder Mead search, with an initial radius of 0.5 centered on the initial point requested by Kokkos (if specified). Future work includes caching results and trying other search strategies like simulated annealing.
  • view commit • Refactoring kokkos tuning away from profiling, making it possible to disable it
  • view commit • Updating to python3
  • view commit • Writing a memory wrapper report. There's a huge amount of CUPTI memory leaks, and they happen when the first real call to CUDA happens. I can't force that call, or ignore memory during the first "real" call, yet.
  • view commit • Cleaner way of preventing "false"(?) CUPTI memory leaks.
  • view commit • Fixing memory leaks and instability during shutdown. When using the memory tracker, make sure that the reporting is done before the BFD address resolution infrastructure is destroyed.
  • view commit • Adding task tree ASCII output, for issue #150
  • view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
  • view commit • Adding "Remainder" to tree ASCII output.
  • view commit • Adding support for ratio and ordinal values
  • view commit • Fixing tree ASCII output and memory leak reporting.
  • view commit • Tasktree human readable is now in a file, and hierarchically sorted by time.
  • view commit • Making --apex:quiet truly quiet
  • view commit • Adding direct multidimensional simulated annealing search.
  • view commit • GCC 9.3.0 has an internal pedantic compiler error. So turning off pedantic.
  • view commit • Updating subproject build of LLVM OpenMP runtime for GCC
  • view commit • Fixing race condition in startup of memory wrapper, I hope...
  • view commit • Updating scripts to python3
  • view commit • Adding counter scatterplot support, too
  • view commit • Allowing for custom scatterplot fractions. To change from the default of 1% (0.01), set APEX_SCATTERPLOT_FRACTION equal to some value between 0.0 and 1.0.
  • view commit • Adding counter scatterplot script
  • view commit • Updating scatterplot scripts to handle larger scales
  • view commit • Do lazy opening of sample files so that the correct Node ID is used
  • view commit • improving colors
  • view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
  • view commit • More scatterplot cleanup
  • view commit • Updating escape sequence for new python
  • view commit • Fixing x axis to make all subgraphs uniform
  • view commit • Fixing dlsym() wrapper function to use templates for the function types, it's better than just blindly casting. Better to let the type system help us.
  • view commit • Added HIP to the configure and added a test case. It seems to work. Now have to add the actual roctracer support.
  • view commit • ROCTX support added.
  • view commit • Working callback support for HIP. Next step is to add activity support, and link the correlation IDs. That should be modeled after the CUPTI support.
  • view commit • Updating scatterplot scripts to add mean values
  • view commit • Working HIP with actions
  • view commit • Merge branch 'develop' into hip
  • view commit • Testing HIP code with CUDA config
  • view commit • Working HIP memory tracki...
  • Read more