Releases · esi-neuroscience/acme

07 Dec 15:19

pantaray

2023.12

71372d3

2023.12 Latest

Latest

Better support for non-x86 micro-architectures. On the ESI HPC cluster,
the convenience function esi_cluster_setup now transparently works with the
local "E880" partition comprising our IBM POWER E880 servers. Similar to
the x86 nodes, a simple

client = esi_cluster_setup(n_workers=10, partition="E880")

is enough to launch ten SLURM workers each equipped with four POWER8 cores
and 16 GB RAM by default. Similarly, ACME's automatic partition selection has been
extended to also support workloads running inside the "E880" partition.
Nonetheless, esi_cluster_setup did not only get simpler to use but now also
comes with more (still completely optional) customization settings:
the new keyword cores_per_worker can be used together with mem_per_worker
and job_extra to create specialized computing clients custom-tailored
to specific workload requirements, e.g.,

client = esi_cluster_setup(n_workers=10,
                           cores_per_worker=3,
                           mem_per_worker="12GB",
                           job_extra=["--job-name='myjob'"],
                           partition="E880")

For more see Advanced Usage and Customization

NEW

New keyword cores_per_worker in esi_cluster_setup to explicitly set
the core-count of SLURM workers.
Extended functionality of ACME's partition auto-selection on the ESI
HPC cluster to include IBM POWER machines in the "E880" partition
Added new "Tutorials" section in documentation
Added new tutorial on using ACME for parallel evaluation of classifier
accuracy (Thanks to @timnaher, cf #53)
Added new tutorial on using ACME for parallel neural net model evaluation
(Thanks to @timnaher, cf #53)
Added type-hints following PEP 484 to support static code analyzers
(e.g., mypy) and clarify type conventions in internal functions with
"sparse" docstrings.

CHANGED

To avoid dubious (and hard to debug) errors, esi_cluster_setup now
checks the micro-architecture of the submitting host against the chosen
partition. This avoids accidental start attempts of ppc64le SLURM jobs
from inside an x86_64 Python interpreter and vice versa.

REMOVED

The partition keyword in esi_cluster_setup does not have a default
value any more (the old default of "8GBXS" was inappropriate most of
the time)
The (undocumented) "anonymous" keyword n_cores of esi_cluster_setup
has been removed in favor of the explicit cores_per_worker (now also
visible in the API). Just like n_cores, setting the new cores_per_worker
parameter is still optional: by default, esi_cluster_setup derives
core-count from DefMemPerCPU and the chosen value of mem_per_worker.
In slurm_cluster_setup, do not use DefMemPerCPU as fallback substitute
in case MaxMemPerCPU is not defined for chosen partition (may be overly
restrictive on requested memory settings)

DEPRECATED

Using start_client in local_cluster_setup does not have any effect
any more: starting a dask LocalCluster always starts a client.

FIXED

fixed partition bug run_tests.sh (Thanks to @timnaher, cf #53)
simplified and fixed interactive user queries: use the builtin select
module in everything but Jupyter and rely on the input module inside
notebooks.
clarified docstring discussing result_dtype: must not be None but
str (still defaults to "float")
numerous corrections of errata/outdated information in docstrings

Contributors

timnaher

Assets 2

17 Apr 09:14

pantaray

2023.4

173f8ac

2023.4

Re-designed ACME's logs and command line output.

NEW

Created templates for filing issues and opening Pull Requests for ACME
on GitHub.
Enabled private security reporting in ACME's GitHub repository and
added a security policy for ACME (in compliance with the OpenSSF Best
Practices Badge)

CHANGED

Overhauled ACME's logging facilities: many print messages have been
marked "DEBUG" to make ACME's default output less "noisy". To this
effect the Python logging module is now used more extensively than
before. The canonical name of ACME's logger is simply "ACME".
By default, ACME now creates a log-file alongside any auto-generated
output files to keep a record of file creation and attribution.
Reworked ACME's SyNCoPy interface: a dedicated module spy_interface.py
is now managing ACME's I/O direction if ACME is called by SyNCoPy. This
allows for (much) cleaner exception handling in ACME's cluster helpers
(esi_cluster_setup, cluster_cleanup etc.) which ultimately permits
a more streamlined extension of ACME to more HPC infrastructure.
Redesigned ACME's online documentation: increased font-size to enhance
readability, included a contribution guide and reworked the overall page
navigation + visual layout.

Assets 2

15 Dec 10:59

pantaray

2022.12

8910765

2022.12

Bugfix release.

CHANGED

If not provided, a new lower default value of one is used for n_workers_startup

FIXED

Updated memory estimation logic on the ESI HPC cluster: if ACME does not
handle result output distribution but memory estimation is still requested
do not perform memEstRun keyword injection.

Assets 2

11 Nov 12:52

pantaray

2022.11

1f80a5c

2022.11

Major changes in managing auto-generated files

If write_worker_results is True, ACME now creates an aggregate results
container comprised of external links that point to actual data in HDF5
payload files generated by parallel workers.
Optionally, results can be slotted into a single dataset/array (via the
result_shape keyword).
If single_file is True, ACME stores results of parallel compute runs
not in dedicated payload files but all workers write to a single aggregate
results container.
By providing output_dir, the location of auto-generated HDF5/pickle files can
be customized
Entities in a distributed computing client that concurrently process tasks
are now consistently called "workers" (in line with dask terminology).
Accordingly the keywords n_jobs, mem_per_job, n_jobs_startup and
workers_per_job have been renamed n_workers, mem_per_worker,
n_workers_startup and processes_per_worker, respectively. To ensure
compatibility with existing code, the former names have been marked
deprecated but were not removed and are still functional.

A full list of changes is provided below

NEW

Included keyword output_dir in ParallelMap that allows to customize the
storage location of files auto-generated by ACME (HDF5 and pickle). Only
effective if write_worker_results is True.
Added keyword result_shape in ParallelMap to permit specifying the
shape of an aggregate dataset/array that results from all computational runs
are slotted into. In conjunction with the shape specification, the new keyword
result_dtype offers the option to control the numerical type (set to
"float64" by default) of the resulting dataset (if write_worker_results = True)
or array (write_worker_results = False). On-disk dataset results collection
is only available for auto-generated HDF5 containers (i.e, write_pickle = False)
Introduced keyword single_file in ParallelMap to control, whether parallel
workers store results of computational runs in dedicated HDF5 files (single_file = False,
default) or share a single results container for saving (single_file = True).
This option is only available for auto-generated HDF5 containers, pickle
files are not supported (i.e., write_worker_results = True and
write_pickle = False).
Included options to specify worker count and memory consumption in local_cluster_setup
Added a new section "Advanced Usage and Customization" in the online documentation
that discusses settings and associated technical details
Added support for Python 3.10 and updated dask dependencies

CHANGED

Modified employed terminology throughout the package: to clearly delineate
the difference between compute runs and worker processes (and to minimize
friction between the documentation of ACME and dask), the term "worker"
is now consistently used throughout the code base. If ACME is running on a
SLURM cluster, a dask "worker" corresponds to a SLURM "job".
In line with the above change, the following input arguments have been
renamed:
- in ParallelMap:
  - n_jobs -> n_workers
  - mem_per_job -> mem_per_worker
- in esi_cluster_setup and slurm_cluster_setup:
  - n_jobs -> n_workers
  - mem_per_job -> mem_per_worker
  - n_jobs_startup -> n_workers_startup
- in slurm_cluster_setup:
  - workers_per_job -> processes_per_worker
Made esi_cluster_setup respect already running clients so that new parallel
computing clients are not launched on top of existing ones (thanks to @timnaher)
Introduced support for positional/keyword arguments of unit-length in
ParallelMap so that n_inputs can be used as scaling parameter to launch
n_inputs calls of a user-provided function
All docstrings and the online documentation have been re-written (and in
parts clarified) to account for the newly introduced features.
Code coverage is not computed by a GitHub action workflow but is now
calculated by the GitLab CI job that invokes SLURM to run tests on the
ESI HPC cluster.

DEPRECATED

The keywords n_jobs, mem_per_job, n_jobs_startup and workers_per_job
have been renamed. Using these keywords is still supported but raises a
DeprecationWarning.

The keywords n_jobs and mem_per_job in both ParallelMap and
esi_cluster_setup are deprecated. To specify the number of parallel
workers and their memory resources, please use n_workers and mem_per_worker,
respectively (see corresponding item in the Section CHANGED above)
The keyword n_jobs_startup in esi_cluster_setup is deprecated. Please
use n_workers_startup instead

FIXED

Updated dependency versions (pin click to version < 8.1) and fixed Syncopy
compatibility (increase recursion depth of input size estimation to one
million calls)
Streamlined dryrun stopping logic invoked if user chooses to not continue
with the computation after performing a dry-run
Modified tests that are supposed to use an existing distributed computing
client to not shut down that very client
Updated memory estimation routine to deactivate auto-generation of results
files to not accidentally corrupt pre-allocated containers before launching
the actual concurrent computation

Contributors

timnaher

Assets 2

04 Aug 14:07

pantaray

2022.8

72aaadb

[2022.8] - 2022-08-05

Bugfixes, new automatic ESI-HPC SLURM partition selection, expanded Python version
compatibility and updated dependencies as well as online documentation overhaul.

NEW

On the ESI HPC cluster, using partition="auto" in ParallelMap now launches
a heuristic automatic SLURM partition selection algorithm (instead of simply
falling back to the "8GBXS" partition)

CHANGED

Updated package dependencies (allow h5py ver 3.x) and expanded support for
recent Python versions (include 3.9)
Restructured and expanded online documentation based on suggestions from @naehert:
moved most examples and usage notes from ParallelMap's docstring to dedicated
docu pages and added new "Troubleshooting + FAQ" site.

FIXED

Repeated ParallelMap calls ignored differing logfile specifications. This
has been corrected. In addition, the logging setup routine now ensures that only
one FileHandler is used (any existing non-default log-file locations are
removed from the logger to avoid generating multiple logs and/or accidentally
appending to existing logs from previous runs).

Assets 2

07 Jul 09:10

pantaray

2022.7

f90d0e3

2022.7

[2022.7] - 2022-07-06

Bugfixes, new versioning scheme and updated dependencies.

CHANGED

Modified versioning scheme: use date-based version tags instead of increasing
numbers
Updated dask, dask-jobqueue and scipy dependency requirements
Removed any mentions of "hpx" from the code after upgrading the main file-server
of the ESI cluster

FIXED

Repaired broken FQDN detection in is_esi_node

Assets 2

01 Mar 11:02

pantaray

0.21

8341f05

0.21

[0.21] - 2022-03-01

Performance improvements, new dryrun keyword and preparations for deploying
ACME on other clusters

NEW

Re-designed cluster startup code: added new function slurm_cluster_setup that
includes SLURM-specific (but ESI-agnostic) code for spinning up a SLURMCluster
Included new dryrun keyword in ParallelMap to test-drive ACME's automatically
generated argument lists simulating a single (randomly picked) worker call prior
to the actual concurrent computation (addresses #39)
Added helper function is_esi_node to determine if ACME is running on the ESI
HPC cluster

CHANGED

Do not parse scalars using numbers.Number, use numpy.number instead to
catch Boolean values
Included conda clean in CD pipeline to avoid disk fillup by unused conda
packages/cache

DEPRECATED

Retired conda2pip in favor of the modern setup.cfg dependency management
system. ACME's dependencies are now listed in setup.cfg which is used to
populate the conda environment file acme.yml at setup time.
Retired travis CI tests since free test runs are exhausted. Migrated to GitHub
actions (and re-included codecov)

FIXED

On the ESI HPC cluster set the job CPU count depending on the chosen partition
if not explicitly provided by the user (one core per 8GB of RAM, e.g., jobs in
a 32GB RAM partition now use 4 cores instead of just one)

Assets 2

19 Oct 17:22

pantaray

v0.2c

a413ca3

v0.2c

[v0.2c] - 2021-10-19

NEW

Included function local_cluster_setup to launch a local distributed Dask
multi-processing cluster running on the host machine

CHANGED

Refined integration with SyNCoPy

FIXED

Repaired auto-generated semantic version strings (use only release number + letter,
remove local ".dev0" suffix from official release versions)

Assets 2

04 Aug 11:43

pantaray

v0.2b

4eb1089

v0.2b

[v0.2b] - 2021-08-04

NEW

Support for custom sbatch arguments (thanks to @KatharineShapcott)

FIXED

Made ID fetching of crashed SLURM jobs more robust
Corrected faulty override of print/showwarning in case ACME was called
from within SyNCoPy.
Cleaned up fetching of SLURM worker memory
Corrected keywords in CITATION.cff

Contributors

KatharineShapcott

Assets 2

18 May 19:32

pantaray

v0.2a

8eaf30c

v02.a

[v0.2a] - 2021-05-18

NEW

Made ACME PEP 517 compliant: added pyproject.toml and modified setup.py
accordingly
Added IBM POWER testing pipeline (via dedicated GitLab Runner)

CHANGED

New default SLURM partition set to "8GBXS" in esi_cluster_setup

REMOVED

Retired tox in slurmtest CI pipeline in favor of a "simple" pytest testing
session due to file-locking problems of tox environments on NFS mounts

FIXED

Stream-lined GitLab Runner setup: use cluster-wide conda instead of local
installations (that differ slightly across runners) and leverage tox-conda
to fetch pre-built dependencies
Opt-in pickling was not propagated correctly in daemon-reentry situations

Assets 2

Releases: esi-neuroscience/acme

2023.12

NEW

CHANGED

REMOVED

DEPRECATED

FIXED

Contributors

2023.4

NEW

CHANGED

2022.12

CHANGED

FIXED

2022.11

NEW

CHANGED

DEPRECATED

FIXED

Contributors

[2022.8] - 2022-08-05

NEW

CHANGED

FIXED

2022.7

[2022.7] - 2022-07-06

CHANGED

FIXED

0.21

[0.21] - 2022-03-01

NEW

CHANGED

DEPRECATED

FIXED

v0.2c

[v0.2c] - 2021-10-19

NEW

CHANGED

FIXED

v0.2b

[v0.2b] - 2021-08-04

NEW

FIXED

Contributors

v02.a

[v0.2a] - 2021-05-18

NEW

CHANGED

REMOVED

FIXED