Optimized cycle slicing routines #37

salotz-sitx · 2021-09-21T15:42:41Z

Somewhat related to #22.

Data in WepyHDF5 is laid out in an Struct-of-Array (SoA) format. This is well suited for a small number of long trajectory datasets and for larger chunk sizes of small data. However, for some queries such as those that cut across all of the trajectories e.g. all of the weights of walkers at a given cycle is unnecessarily slow.

There should be some way to mitigate the cost of this in a convenient method since this is a very common operation.

Also see #34 for with the upcoming PR has support for generating traces for cycles. This could be used and batched with the ideas in #22 to improve performance when: A) many data points fit in memory, and B) you know ahead of time what you want to load in.

For example if you wanted to do a time series average over all of the walkers cycle by cycle you know ahead of time you want all of the cycles data (or some striding) so if the data (like weights) can fit into memory you could batch it all and load it at once greatly amortizing the cost of doing it iteratively.

salotz added the enhancement label Aug 31, 2023

salotz added this to the 2.0 Release milestone Aug 31, 2023

alexrd mentioned this issue Nov 27, 2023

Reading weights from the HDF5 is slow #123

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized cycle slicing routines #37

Optimized cycle slicing routines #37

salotz-sitx commented Sep 21, 2021

Optimized cycle slicing routines #37

Optimized cycle slicing routines #37

Comments

salotz-sitx commented Sep 21, 2021