You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data in WepyHDF5 is laid out in an Struct-of-Array (SoA) format. This is well suited for a small number of long trajectory datasets and for larger chunk sizes of small data. However, for some queries such as those that cut across all of the trajectories e.g. all of the weights of walkers at a given cycle is unnecessarily slow.
There should be some way to mitigate the cost of this in a convenient method since this is a very common operation.
Also see #34 for with the upcoming PR has support for generating traces for cycles. This could be used and batched with the ideas in #22 to improve performance when: A) many data points fit in memory, and B) you know ahead of time what you want to load in.
For example if you wanted to do a time series average over all of the walkers cycle by cycle you know ahead of time you want all of the cycles data (or some striding) so if the data (like weights) can fit into memory you could batch it all and load it at once greatly amortizing the cost of doing it iteratively.
The text was updated successfully, but these errors were encountered:
Somewhat related to #22.
Data in WepyHDF5 is laid out in an Struct-of-Array (SoA) format. This is well suited for a small number of long trajectory datasets and for larger chunk sizes of small data. However, for some queries such as those that cut across all of the trajectories e.g. all of the weights of walkers at a given cycle is unnecessarily slow.
There should be some way to mitigate the cost of this in a convenient method since this is a very common operation.
Also see #34 for with the upcoming PR has support for generating traces for cycles. This could be used and batched with the ideas in #22 to improve performance when: A) many data points fit in memory, and B) you know ahead of time what you want to load in.
For example if you wanted to do a time series average over all of the walkers cycle by cycle you know ahead of time you want all of the cycles data (or some striding) so if the data (like weights) can fit into memory you could batch it all and load it at once greatly amortizing the cost of doing it iteratively.
The text was updated successfully, but these errors were encountered: