Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for dask distributed scheduler in quantum detector reader #267

Merged
merged 2 commits into from
Jun 5, 2024

Conversation

ericpre
Copy link
Member

@ericpre ericpre commented May 29, 2024

Progress of the PR

@ericpre
Copy link
Member Author

ericpre commented May 29, 2024

Thank you @CSSFrancis, as you suggested in #266 (comment), #11 was useful to figure out the details with the structured dtype - pretty much copy & paste the relevant bits! 😄

Copy link
Member

@CSSFrancis CSSFrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a much smaller change than I thought it would be :). It looks good to me other than it might be nice to return the header as well although a warning that this could increase the loading time would be useful.

@@ -344,6 +355,10 @@ def load_mib_data(
data = data.rechunk(chunks)

if return_headers:
if distributed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still return the header by just setting the key="header" for a second memmap_distributed call. It will add some time onto the saving of the dataset as the entire dataset might get loaded into ram with most of it thrown away.

Really what we should do is add things to a to_store context manager and then call:

da.store(data, dset)

Only once. That will merge taskgraphs as necessary and might reduce the time for saving certain signals. I've thought about it for things like saving lazy markers of possibly creating a hs.save() function for handling mulitple signals if you wanted to save multiple parts of some anaylsis efficently. This is a fairly abstract/higher level concept so maybe it would be seledomly used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will most likely needed to be done at some point! I opened #269 to track it / add more usecases.

@ericpre ericpre merged commit 32ac8fc into hyperspy:main Jun 5, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants