Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized cycle slicing routines #37

Open
salotz-sitx opened this issue Sep 21, 2021 · 0 comments
Open

Optimized cycle slicing routines #37

salotz-sitx opened this issue Sep 21, 2021 · 0 comments

Comments

@salotz-sitx
Copy link
Collaborator

Somewhat related to #22.

Data in WepyHDF5 is laid out in an Struct-of-Array (SoA) format. This is well suited for a small number of long trajectory datasets and for larger chunk sizes of small data. However, for some queries such as those that cut across all of the trajectories e.g. all of the weights of walkers at a given cycle is unnecessarily slow.

There should be some way to mitigate the cost of this in a convenient method since this is a very common operation.

Also see #34 for with the upcoming PR has support for generating traces for cycles. This could be used and batched with the ideas in #22 to improve performance when: A) many data points fit in memory, and B) you know ahead of time what you want to load in.

For example if you wanted to do a time series average over all of the walkers cycle by cycle you know ahead of time you want all of the cycles data (or some striding) so if the data (like weights) can fit into memory you could batch it all and load it at once greatly amortizing the cost of doing it iteratively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants