Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why we need to set dask to single tread? #233

Closed
singledoggy opened this issue Dec 14, 2023 · 4 comments · Fixed by #236
Closed

Why we need to set dask to single tread? #233

singledoggy opened this issue Dec 14, 2023 · 4 comments · Fixed by #236

Comments

@singledoggy
Copy link

    # TODO: current workaround to dask thread problems
    import dask
    dask.config.set(scheduler='single-threaded')

I've noticed force single threaded mode #39

@SiMaria this makes the trick with the preprocessing obsolete, but forces single threads instead of the (theoretically) faster multi-threading. In our case, saving memory is more important than time so we should be good with this change for now

And it may has someting to do with #37

Currrently I tend to use

client = Client(n_workers=16, threads_per_worker=1, memory_limit="8GB")

so this setting may change the default config and can't work with my current workflow.

And in my case, If i comment the dask.config.set(scheduler='single-threaded'), I didn't meet any error when loading data by

o3 = salem.open_mf_wrf_dataset(date_strings)

I'll do more tests when I have time.

@fmaussion
Copy link
Owner

Thanks for the report!

Salem has been in a maintenance-only mode since quite a while now, I wouldn't be surprised if a lot of the code is not up-to-date with dask/xarray standards...

@fmaussion
Copy link
Owner

@singledoggy is there anything we can do here? I'll release a maintenance update soon, it might be a moment to tackle this....

@singledoggy
Copy link
Author

Sorry for the late reply. I must clarify that my expertise in dask is limited, and therefore, the suggestions provided may not be entirely accurate.

I notice that the HDF5 library was not thread safe , so it's wise to set single-threaded mode. But I don't know why this conficts with the settings in the Dask cluster.

But maybe it's better to temporarily set configuration values within a context manager?

# As a context manager
>>> with dask.config.set(scheduler='processes'):
...     x.sum().compute()
# Set globally
>>> dask.config.set(scheduler='processes')
>>> x.sum().compute()

@singledoggy
Copy link
Author

And I'm not sure if it's a good way to set dask.config.set(scheduler='single-threaded') within salem, I've checked xarray method open_mfdataset, it seems that they have not chose this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants