Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dask-specific methods on DataTree #9355

Open
TomNicholas opened this issue Aug 13, 2024 · 0 comments
Open

Implement dask-specific methods on DataTree #9355

TomNicholas opened this issue Aug 13, 2024 · 0 comments
Labels
contrib-help-wanted topic-dask topic-DataTree Related to the implementation of a DataTree class

Comments

@TomNicholas
Copy link
Member

What is your issue?

xr.Dataset implements a bunch of dask-specific methods, such as __dask_tokenize__ and __dask_graph__. It also obviously has public methods that involve dask such as .compute() and .load().

In DataTree on the other hand, I haven't yet implemented any methods like these, or even written any tests that involve dask! You can probably still use dask with datatree right now, but from dask's perspective the datatree is presumably merely a set of unconnected Dataset objects.

We could choose to implement methods like .load() as just a mapping over the tree, i.e.

def load(self):
    for node in self.subtree:
        if node.has_data:
            node.ds.load()

Most of that should just already work (or work very easily) using map_over_subtree.

There are also special double-underscore methods defined on Dataset

https://docs.dask.org/en/stable/custom-collections.html

Xarray objects satisfy this Collections protocol, so you can do dask.tokenize(xarray_thing), dask.compute(xarray_thing) etc (visualize, persist).


We could add these, but it would be rather nice if someone who understands the double-underscore dask methods really well just took this on. @darothen helpfully started this in xarray-contrib/datatree#196 but it stalled.

@jrbourbeau are you/Coiled interested in submitting a PR to get xarray.DataTree fully integrated with dask?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contrib-help-wanted topic-dask topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

No branches or pull requests

1 participant