Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing tree should create new tree #9341

Open
TomNicholas opened this issue Aug 13, 2024 · 1 comment
Open

Indexing tree should create new tree #9341

TomNicholas opened this issue Aug 13, 2024 · 1 comment
Labels
API design topic-DataTree Related to the implementation of a DataTree class

Comments

@TomNicholas
Copy link
Member

What is your issue?

Inspired by this example in the stackstac documentation

lowcloud = stack[stack["eo:cloud_cover"] < 20]

we should ensure that you can index a datatree with another (isomorphic) datatree, so that the above operation would work even if stack is a DataTree instance.

This is another map_over_subtree-type operation, but it needs careful testing because the __getitem__ function in xarray objects already does so many different things. This won't work with the code as-is because at the moment the DataTree naively dispatches the __getitem__ call down to the wrapped dataset.

https://github.com/xarray-contrib/datatree/blob/cd0695160e261466efc7f51fece02ca9bea2101c/datatree/datatree.py#L238

@TomNicholas TomNicholas added API design topic-DataTree Related to the implementation of a DataTree class labels Aug 13, 2024
@TomNicholas
Copy link
Member Author

To clarify, in order for this to work several things need to happen:

  1. stack["eo:cloud_cover"] needs to realise that "eo:cloud_cover" is not a tree, not a group in the tree, but a variable name. Then it needs to select the "eo:cloud_cover" variable from all nodes in the subtree, and return a tree containing only those variables. That in itself requires something like Dataset.concat should allow a string for the concat_over argument #67 but ignoring nodes for which that variable is not present, at least for deeply-nested trees...
  2. stack["eo:cloud_cover"] < 20 needs to perform this comparison node-wise, returning a tree of results (hopefully this should already work...
  3. stack[stack["eo:cloud_cover"] < 20] needs to use the tree passed to perform a node-wise indexing operation, returning a new tree. (Or we could just .where)

Basically this is a really complicated usage example because it uses multiple different code-paths within __getitem__ sequentially within one line of user code 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

No branches or pull requests

1 participant