-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for dask distributed scheduler in quantum detector reader #267
Conversation
Thank you @CSSFrancis, as you suggested in #266 (comment), #11 was useful to figure out the details with the structured dtype - pretty much copy & paste the relevant bits! 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a much smaller change than I thought it would be :). It looks good to me other than it might be nice to return the header as well although a warning that this could increase the loading time would be useful.
@@ -344,6 +355,10 @@ def load_mib_data( | |||
data = data.rechunk(chunks) | |||
|
|||
if return_headers: | |||
if distributed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can still return the header by just setting the key="header"
for a second memmap_distributed
call. It will add some time onto the saving of the dataset as the entire dataset might get loaded into ram with most of it thrown away.
Really what we should do is add things to a to_store
context manager and then call:
rosettasciio/rsciio/hspy/_api.py
Line 111 in 31bd677
da.store(data, dset) |
Only once. That will merge taskgraphs as necessary and might reduce the time for saving certain signals. I've thought about it for things like saving lazy markers of possibly creating a hs.save()
function for handling mulitple signals if you wanted to save multiple parts of some anaylsis efficently. This is a fairly abstract/higher level concept so maybe it would be seledomly used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will most likely needed to be done at some point! I opened #269 to track it / add more usecases.
Progress of the PR
.mrc
using Distributed and with Direct Electron Metadata #162 and Add .seq format for DE 16 and Celeritas Camera #11,upcoming_changes
folder (seeupcoming_changes/README.rst
),docs/readthedocs.org:rosettasciio
build of this PR (link in github checks)