All notable changes to this project will be documented in this file
The format is based on Keep a Changelog.
3.5.0 2024-12-03
- Asynchronous support for classic LeWI
- Several SMT enhancements for LeWI policies
- Allowed to override lewi classic/mask with
--lewi-affinity
- TALP POP metrics now includes experimental OpenMP hybrid metrics
- TALP global region is now exposed in the API
- TALP-Pages, a new tool for Continuous Performance Monitoring in static HTML pages
- Add flag
--talp-region-select
to filter active regions - SLURM integration via
dlb_taskset
- CMake config for other projects to link with DLB
- Several examples and documentation reworked
- DLB version information can be accessed though the API
--talp-summary
has been simplified and nowpop-metrics
also includes raw metrics if using an output file, andprocess
metrics now includes node identifiers- TALP now only stores monitoring regions in shared memory if
--talp-external-profiler
is set - TALP output structure has been reworked
- TALP main region is now called "Global"
- LeWI mask now correctly supports threads blocked in MPI calls while pinned to multiple CPUs
- Add sanity checks for hardware counters in TALP
- Print JSON and CSV files in the proper locale
--talp-summary
values forpop-raw
andnode
are deprecated- TALP output format XML is now deprecated
--talp-regions-per-proc
flag is deprecated for a new experimental--shm-size-multiplier
flag- Several fields in
dlb_monitor_t
are now deprecated - Several fields in
dlb_pop_metrics_t
are now deprecated DLB_MonitoringRegionGetMPIRegion
deprecated in favor ofDLB_MonitoringRegionGetGlobal
DLB_Stats_GetCpuStateIdle
functionality no longer providedDLB_Stats_GetCpuStateOwned
functionality no longer providedDLB_Stats_GetCpuStateGuested
functionality no longer provided
3.4.1 2024-08-16
- Fix an error in the shared memory alignment that was causing
segmentation faults when compiling with
-march=native
- Avoid registering role shifting callbacks for other non-related OpenMP thread managers
- Update examples with supported options
- Fix some parameters in the Fortran'08 interface
- Be more resilient if PAPI fails to initialize
- Enhance compatibility in other systems
- Quote string names in csv files
- Several other minor fixes
3.4 2023-12-22
- PAPI support for TALP metrics
libdlb_mpic.so
andlibdlb_mpic_*.so
are C MPI only libraries that may be built using--enable-c-mpi-library
at configure time- Functions to reset, stop, start and report monitoring regions now accept the special argument DLB_MPI_REGION for the implicit region
- Function
DLB_TALP_QueryPOPNodeMetrics
for third-party applications to query pop metrics. Requires--talp-external-profiler
. - Named barriers and several API functions to manage them
- Added
--lewi-barrier
and--lewi-barrier-select
to fine-tune which barriers activate LeWI. - Added
--lewi-color
to select specific key only for LeWI
libdlb_mpif.so
andlibdlb_mpif_*.so
are no longer built by default, only if--enable-fortran-mpi-library
is set at configure time- Flag
--quiet
now only suppresses INFO and VERBOSE, added new flag--silent
to keep the old functionality to suppress all messages - Refactor
DLB_TALP_CollectNodeMetrics
toDLB_TALP_CollectPOPNodeMetrics
and add communication efficiency - TALP now appends to CSV files if they already exist
- Fixed wrong generated code for
MPI_Initialized
andMPI_Finalized
--lewi-ompt
no longer accepts "mpi" nor "aggressive" as values. Automatic LeWI via synchronization calls is now done with--lewi-mpi-calls
for MPI and--lewi-barrier
or--lewi-barrier-select
for DLB Barriers.
3.3.1 2023-05-18
- Fixed wrong generated code for
MPI_Initialized
andMPI_Finalized
3.3 2023-05-15
- Free agent and Role-shift OMPT thread managers to support LeWI with both implementations
- Flag
--ompt-thread-manager
to select which OpenMP implementation to use - MPI Fortran 2008 bindings
- TALP flag to generate file in different output formats
--talp-output-file
- New TALP collective functions to gather and compute metrics:
DLB_TALP_CollectPOPMetrics
andDLB_TALP_CollectNodeMetrics
libdlb_mpi.so
andlibdlb_mpi_*.so
have now both C and Fortran MPI symbols
- Fixed DROM pre-initialization if child had empty cpuset affinity
- Fixed
--lewi-max-parallelism
- Fixed several TALP bugs
- Fixed some finalization errors during MPI finalize
- Fixed cpuset parsing when provided a non-contiguous mask
3.2 2022-03-16
- Flag
--verbose
to enable all verbose modes - Flag
--talp-summary=pop-raw
to print raw POP metrics - Flag
--lewi-respect-cpuset
to allow LeWI to use CPUs not yet registered
- DROM can now steal all CPUs from one process
- DROM can now inherit a subset of CPUs from other process
DLB_DROM_SetProcessMask
to oneself does not longer require aDLB_pollDROM
DLB_Lend
in OpenMP applications now invokes the OpenMP runtime to change the number of threads
- Fixed TALP regions enabled or registered only on some processes
- Fixed minor option parsing
3.1 2021-11-05
- New
--lewi-mpi-calls
value:none
- New MPI runtime version check during initialization
- Experimental meson build files
- Add better support for getting/setting process mask from own process
- Enable
--barrier
by default - Rename
--lewi-mpi
to its opposite:--lewi-keep-one-cpu
, and change the default behavior - Rename
--talp-summary=app
to--talp-summary=pop-metrics
and make it the default value - Rename TALP to Tracking Application Live Performance
- CPU priority now follows a better topology order
- Properly clean up shared memory during initialization
- Fixed several TALP issues
- Improves support for OMPT
- Proper shared memory clean up if running under
dlb_run
- Python viewer scripts are no longer installed
3.0 2021-01-15
- New TALP module: Tracking Application Low-level Performance
- TALP Monitoring Regions for user-defined regions
- Allow processes to attach to / detach from the Barrier module
- Improve the verbose messages for some modules
- Allow partial instrumentation of some events
- Man pages for DLB commands
- DLB library now always prints to stderr, DLB binaries may still use stdout
- Dropped support for binary mask old format
1000b
in favor of0b0001
DLB_ARGS
variable now takes precedence overDLB_Init
argument
- Some callbacks not being invoked when the action involved some successful actions and some others not allowed
- Several DROM inconsistencies
- Several minor fixes
- Minor documentation fixes
2.1 2019-07-15
- OMPT full support
- Add function to API:
DLB_UnsetMaxParallelism
- New binary
dlb_run
to preinit masks, needed for OMPT applications - Improved
dlb_shm --list
output - New verbose option
affinity
to print hardware information - Add option
--quiet
to silent all info and warning messages - New test mechanism based on LIT
DLB_Lend
does no longer keeps the current CPU- PreInit service now handles the timeout if the synchronous flag is provided
- DROM now accepts registering and setting empty masks
- Verbose options now affect all library versions
- Added some mechanisms to clean shared memories when the program aborts
- Fixed several race conditions with the asynchronous messages
- Fixed an issue where
--lewi-affinity
was being ignored - Adapt Fortran headers to be fixed-form compatible
2.0.2 2018-10-29
DLB_ARGS
was not being correctly parsed in some cases- DROM API contained a typo. New function is called
DLB_DROM_Detach
- Intel compiler and GCC8 compatibility
- Several minor bugs
2.0.1 2018-02-27
- MaxParallelism is updated correctly when set and later decreased
- AcquireCpus now correctly ignores reclaimed CPUs
- Visualization errors parsing
DLB_ARGS
- Removed non intended warning about setting CPUs when using LeWI with MPI support
- Several minor fixes
2.0 2017-12-21
- Callback system
- API for subprocesses
- New interaction mode choice (synchronous as it was before, or asynchronous)
- Doxygen HTML and man pages documentation
- Test coverage support
- OMPT experimental support
- API reworked
- A petition of a CPU now registers the process into a CPU petition queue. If this CPU becomes available, DLB can schedule which process will acquire it
- A petition of an unspecified CPU registers the process into a global petition queue, this queue has less priority than the CPU queue
- DLB acquire now does not schedule because it forces the acquisition, DLB Borrow does scheduling but only looks for idle CPUs and never creates a CPU request
- DLB options print format reworked,
DLB_ARGS
is now used to pass options to DLB. - Fortran Interface is now provided using an include with ISO C bindings.
- Shared Memory synchronization mechanism is now managed using a pthread spinlock
- DROM services now use the same shmem handler so no need to call
DLB_Init
andDLB_DROM_Init
- DROM services Get/Set process mask are now asynchronous functions
- Test suite now follows a Unit Testing approach and does not need undocumented external tools
- Several minor bugs
- PreInit service now correctly handles environment variables and allows forks
DLB_MASK
is no longer used, only registered CPUs may be used by other processesDLB_DROM_GetCpus
service has been removedLB_OPTION
type of environment variables is deprecated
1.3.2 - 2017-10-9
- Bug initialing shared memory when using non-mask policies
1.3.1 - 2017-10-4
- Bug parsing
DLB_ARGS
1.3 - 2017-07-12
- This CHANGELOG file
- Shared Memory consistency checks upon registration
- The binary
dlb_taskset
can now be used to launch processes under DLB - Service
DLB_Barrier
to perform a barrier between all processes in the node - DLB Services now return an error code
- New User Option
LB_PRIORITY
to manage how CPUs are distributed according to HW locality - Add services for DROM to PreInit and PostFinalize to manage process that will fork/exec
- Add services to check if the process mask needs to be changed
- Refactor policy based Shared Memory into two general purpose
cpuinfo
andprocinfo
- DROM interface is now considered stable. External processes can manage the CPU ownership of DLB processes
- DLB options are now parsed through the environment variable
DLB_ARGS
- Several minor bugs
- Signal handler feature is no longer supported
1.2 - 2016-12-23
- Improved User Guide
- Improved configure script and macros
- Compatibility with gcc6
- Compatibility with icc17
- Compatibility with C11 standard
- Several minor bugs
1.1 - 2016-02-04
- Policy
Autonomous Lewi Mask
to micro-manage each thread individually - DLB services to init/finalize without MPI support
- DLB services to enable/disable or mark a serial/parallel sections of code
- DLB service to modify User Options during program execution
- New library independent from MPI
- Binary
dlb_shm
to manage Shared Memory from CLI - Binary
dlb_taskset
to manage processes' mask from CLI - BG/Q support
- Add
LB_MASK
environment variable to specify aDLB_MASK
common to all processes - Experimental implementation of
DLB_Stats
interface - Experimental implementation of
DLB_DROM
interface - Signal handler feature to clean up DLB on termination
- In-code Doxygen documentation
- User guide using Sphinx generator
- Distributable examples
- Replace IPC with POSIX Shared Memory
- Replace Shared Memory synchronization mechanism from semaphores to pthread mutexes
- Shared Memory creation is now decentralized from rank 0 and it will be created when necessary
- Shared Memory internal structure reworked to host information about ownership and guesting of CPUs
- CPUs from
DLB_MASK
can guest any process, but they will not have an owner - Improve support for detection of Shared Memory runtimes
- Compatibility with gcc 4.4 and 4.5
- Compatibility with MPI-3 standard
- Fortran interfaces now use the correct types
- Several minor bugs
- DLB Interface
- MPI Interception Interface
- User options configured by environment variables
LB_*
- Lend When Idle algorithm implementation
- DPD
- Extrae instrumentation
- Paraver configs for DLB events
- Inter-Process Communication (IPC) Shared Memory
- Inter-Process Communication (IPC) Sockets
- Thread binding interface
LeWI
policies to support SMPSS and OpenMPLeWI mask
policy to support OmpSs- RaL and PERaL policies for redistribution of resources
- HWLOC optional dependency to query HW info
- Scheduling decisions based on HW locality
- Binary
dlb