-
Notifications
You must be signed in to change notification settings - Fork 41
Conversation
for more information, see https://pre-commit.ci
Tag @TomNicholas, will work on testing this over the coming days as I have time. |
Here's a gist based on @jbusecke's CMIP6 demo showing the top-level integration of Still left to do are writing some test cases and further deep-diving to make sure that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @darothen ! Thank you for taking this on. There's just a couple of places that look a little off to me which I've highlighted.
# darothen: Are we sure that results_iter is ordered the same as | ||
# self.subtree? | ||
# self.subtree? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the iterable of DatasetView
come from? Presumably you are looping over the nodes, but where are you doing that? Like I don't understand where results
is passed in from.
new_datatree_dict = {node.path: node.ds.load(**kwargs) for node in self.subtree} | ||
return DataTree.from_dict(new_datatree_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this will have the behavior you intend: DataTree.from_dict
will construct a completely new tree object, and then you are inserting whatever you get when you call DatasetView.load()
. It's not altering self
in-place.
Also I think it would be worth double-checking that DatasetView.load()
does what you expect too with regard to new objects / copying - I never really thought about that case when I wrote DatasetView
.
If you want to return the same tree but with all the data loaded I think you need to alter the current tree in-place instead of creating a new one, i.e. something like
for node in self.subtree:
self[node.path] = node.ds.load
though this might not fail gracefully...
new_datatree_dict = { | ||
node.path: node.ds.persist(**kwargs) for node in self.subtree | ||
} | ||
return DataTree.from_dict(new_datatree_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on .load()
applies to this too I think.
ds_tokens = {node.path: node.ds.__dask_tokenize__() for node in self.subtree} | ||
ds_tokens = {node.path: node.ds.__dask_tokenize__() for node in self.subtree} | ||
|
||
return normalize_token((type(self), ds_tokens)) | ||
|
||
return normalize_token((type(self), ds_tokens)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unintentional repetition of lines? The double return
shouldn't even be valid syntax should it??
graphs = {node.path: node.ds.__dask_graph__() for node in self.subtree} | ||
graphs = {node.path: node.ds.__dask_graph__() for node in self.subtree} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More unintentional repetition?
return [node.ds.__dask_keys__() for node in self.subtree] | ||
return [node.ds.__dask_keys__() for node in self.subtree] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
Thanks for the quick review @TomNicholas, hoping to address later today or tomorrow. Note on the line repetition - looks like I screwed up a merge somewhere, will need to fix that separately. |
@darothen wondering if you had any time soon to revisit this PR? Would be great to get it in soon because Julius and I are writing another blog post about using datatree with dask on CMIP6 data. |
@TomNicholas I'm hacking on some projects this weekend, let me see if I can wrap things up. Apologies for the delay... it became very hectic at work shortly after the hackathon and I haven't had much time for side projects. |
Closing this as tracked upstream by pydata/xarray#9355 |
This is an initial implementation of the feature requested in #97.
The first implementation here very closely follows the implementation of these methods by
xarray.Dataset
. For the majority of the methods, this should work fine; we iterate over all the nodes in our tree, starting at the root, and perform the necessarydask.collections
API operation. However,__dask_post{compute,persist}__
is a bit more complicated; some additional testing is required to ensure that we're appropriately applying the available support utilities to re-construct our finalDataTree
without any superfluous work.pre-commit run --all-files
api.rst
docs/source/whats-new.rst