Releases: bsc-pm/dlb
Releases · bsc-pm/dlb
Version 3.5.0
Added
- Asynchronous support for classic LeWI
- Several SMT enhancements for LeWI policies
- Allowed to override lewi classic/mask with
--lewi-affinity
- TALP POP metrics now includes experimental OpenMP hybrid metrics
- TALP global region is now exposed in the API
- TALP-Pages, a new tool for Continuous Performance Monitoring in static HTML pages
- Add flag
--talp-region-select
to filter active regions - SLURM integration via
dlb_taskset
- CMake config for other projects to link with DLB
- Several examples and documentation reworked
- DLB version information can be accessed though the API
Changed
--talp-summary
has been simplified and nowpop-metrics
also includes raw
metrics if using an output file, andprocess
metrics now includes node
identifiers- TALP now only stores monitoring regions in shared memory if
--talp-external-profiler
is set - TALP output structure has been reworked
- TALP main region is now called "Global"
Fixed
- LeWI mask now correctly supports threads blocked in MPI calls while pinned to
multiple CPUs - Add sanity checks for hardware counters in TALP
- Print JSON and CSV files in the proper locale
Deprecated
--talp-summary
values forpop-raw
andnode
are deprecated- TALP output format XML is now deprecated
--talp-regions-per-proc
flag is deprecated for a new experimental
--shm-size-multiplier
flag- Several fields in
dlb_monitor_t
are now deprecated - Several fields in
dlb_pop_metrics_t
are now deprecated DLB_MonitoringRegionGetMPIRegion
deprecated in favor of
DLB_MonitoringRegionGetGlobal
DLB_Stats_GetCpuStateIdle
functionality no longer providedDLB_Stats_GetCpuStateOwned
functionality no longer providedDLB_Stats_GetCpuStateGuested
functionality no longer provided
Version 3.4.1
Fixed
- Fix an error in the shared memory alignment that was causing
segmentation faults when compiling with-march=native
- Avoid registering role shifting callbacks for other non-related
OpenMP thread managers - Update examples with supported options
- Fix some parameters in the Fortran'08 interface
- Be more resilient if PAPI fails to initialize
- Enhance compatibility in other systems
- Quote string names in csv files
- Several other minor fixes
Version 3.4
Added
- PAPI support for TALP metrics
libdlb_mpic.so
andlibdlb_mpic_*.so
are C MPI only libraries
that may be built using--enable-c-mpi-library
at configure time- Functions to reset, stop, start and report monitoring regions now
accept the special argument DLB_MPI_REGION for the implicit region - Function
DLB_TALP_QueryPOPNodeMetrics
for third-party applications
to query pop metrics. Requires--talp-external-profiler
. - Named barriers and several API functions to manage them
- Added
--lewi-barrier
and--lewi-barrier-select
to fine-tune
which barriers activate LeWI. - Added
--lewi-color
to select specific key only for LeWI
Changed
libdlb_mpif.so
andlibdlb_mpif_*.so
are no longer built by default,
only if--enable-fortran-mpi-library
is set at configure time- Flag
--quiet
now only suppresses INFO and VERBOSE, added new flag
--silent
to keep the old functionality to suppress all messages - Refactor
DLB_TALP_CollectNodeMetrics
to
DLB_TALP_CollectPOPNodeMetrics
and add communication efficiency - TALP now appends to CSV files if they already exist
Fixed
- Fixed wrong generated code for
MPI_Initialized
andMPI_Finalized
Deprecated
--lewi-ompt
no longer accepts "mpi" nor "aggressive" as values.
Automatic LeWI via synchronization calls is now done with
--lewi-mpi-calls
for MPI and--lewi-barrier
or
--lewi-barrier-select
for DLB Barriers.
Version 3.3.1
Fixed
- Fixed wrong generated code for MPI_Initialized and MPI_Finalized
Version 3.3
Added
- Free agent and Role-shift OMPT thread managers to support LeWI with both
implementations - Flag
--ompt-thread-manager
to select which OpenMP implementation to use - MPI Fortran 2008 bindings
- TALP flag to generate file in different output formats
--talp-output-file
- New TALP collective functions to gather and compute metrics:
DLB_TALP_CollectPOPMetrics
andDLB_TALP_CollectNodeMetrics
Changed
libdlb_mpi.so
andlibdlb_mpi_*.so
have now both C and Fortran MPI symbols
Fixed
- Fixed DROM pre-initialization if child had empty cpuset affinity
- Fixed
--lewi-max-parallelism
- Fixed several TALP bugs
- Fixed some finalization errors during MPI finalize
- Fixed cpuset parsing when provided a non-contiguous mask
Version 3.2
Added
- Flag
--verbose
to enable all verbose modes - Flag
--talp-summary=pop-raw
to print raw POP metrics - Flag
--lewi-respect-cpuset
to allow LeWI to use CPUs not yet registered
Changed
- DROM can now steal all CPUs from one process
- DROM can now inherit a subset of CPUs from other process
DLB_DROM_SetProcessMask
to oneself does not longer require aDLB_pollDROM
DLB_Lend
in OpenMP applications now invokes the OpenMP runtime to change
the number of threads
Fixed
- Fixed TALP regions enabled or registered only on some processes
- Fixed minor option parsing
Version 3.1
Added
- New
--lewi-mpi-calls
value:none
- New MPI runtime version check during initialization
- Experimental meson build files
- Add better support for getting/setting process mask from own process
Changed
- Enable
--barrier
by default - Rename
--lewi-mpi
to its opposite:--lewi-keep-one-cpu
, and
change the default behavior - Rename
--talp-summary=app
to--talp-summary=pop-metrics
and
make it the default value - Rename TALP to Tracking Application Live Performance
- CPU priority now follows a better topology order
- Properly clean up shared memory during initialization
Fixed
- Fixed several TALP issues
- Improves support for OMPT
- Proper shared memory clean up if running under
dlb_run
Deprecated
- Python viewer scripts are no longer installed
Version 3.0
Added
- New TALP module: Tracking Application Low-level Performance
- TALP Monitoring Regions for user-defined regions
- Allow processes to attach to / detach from the Barrier module
- Improve the verbose messages for some modules
- Allow partial instrumentation of some events
- Man pages for DLB commands
Changed
- DLB library now always prints to stderr, DLB binaries may still use stdout
- Dropped support for binary mask old format
1000b
in favor of0b0001
DLB_ARGS
variable now takes precedence overDLB_Init
argument
Fixed
- Some callbacks not being invoked when the action involved some successful
actions and some others not allowed - Several DROM inconsistencies
- Several minor fixes
- Minor documentation fixes
Version 2.1
Added
- OMPT full support
- Add function to API:
DLB_UnsetMaxParallelism
- New binary
dlb_run
to preinit masks, needed for OMPT applications - Improved
dlb_shm --list
output - New verbose option
affinity
to print hardware information - Add option
--quiet
to silent all info and warning messages - New test mechanism based on LIT
Changed
DLB_Lend
does no longer keeps the current CPU- PreInit service now handles the timeout if the synchronous flag is provided
- DROM now accepts registering and setting empty masks
- Verbose options now affect all library versions
Fixed
- Added some mechanisms to clean shared memories when the program aborts
- Fixed several race conditions with the asynchronous messages
- Fixed an issue where
--lewi-affinity
was being ignored - Adapt Fortran headers to be fixed-form compatible
Version 2.0.2
Minor hotfixes