-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-implement map_over_datasets using group_subtrees #9636
Merged
Merged
Changes from 20 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
bde843c
Add zip_subtrees for paired iteration over DataTrees
shoyer 23da8ca
fix pytype error
shoyer 4480e11
Merge branch 'main' into zip_subtree
shoyer 739573a
Re-implement map_over_datasets
shoyer bed8cba
Merge branch 'main' into zip_subtree_map
shoyer 4353581
fix typing of map_over_datasets
shoyer 1aa7601
add group_subtrees
shoyer 89ea46e
wip fixes
shoyer 16ef362
Merge branch 'main' into zip_subtree_map
shoyer 93ba3a1
update isomorphic
shoyer e4bc1a0
documentation and API change for map_over_datasets
shoyer 3b5a41b
mypy fixes
shoyer 5cc7e8f
fix test
shoyer 8ef0522
diff formatting
shoyer 1f931ff
more mypy
shoyer 5a99811
doc fix
shoyer bd976f6
more doc fix
shoyer dd0280d
add api docs
shoyer 1f07b63
add utility for joining path on windows
shoyer ab81dcf
docstring
shoyer 74119c3
add an overload for two return values from map_over_datasets
shoyer b93c46e
partial fixes per review
shoyer fca6780
fixes per review
shoyer b681181
remove a couple of xfails
shoyer b9b3f3e
Merge branch 'main' into zip_subtree_map
shoyer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -362,21 +362,26 @@ This returns an iterable of nodes, which yields them in depth-first order. | |
for node in vertebrates.subtree: | ||
print(node.path) | ||
|
||
A very useful pattern is to use :py:class:`~xarray.DataTree.subtree` conjunction with the :py:class:`~xarray.DataTree.path` property to manipulate the nodes however you wish, | ||
then rebuild a new tree using :py:meth:`xarray.DataTree.from_dict()`. | ||
Similarly, :py:class:`~xarray.DataTree.subtree_with_keys` returns an iterable of | ||
relative paths and corresponding nodes. | ||
|
||
A very useful pattern is to iterate over :py:class:`~xarray.DataTree.subtree_with_keys` | ||
to manipulate nodes however you wish, then rebuild a new tree using | ||
:py:meth:`xarray.DataTree.from_dict()`. | ||
For example, we could keep only the nodes containing data by looping over all nodes, | ||
checking if they contain any data using :py:class:`~xarray.DataTree.has_data`, | ||
then rebuilding a new tree using only the paths of those nodes: | ||
|
||
.. ipython:: python | ||
|
||
non_empty_nodes = {node.path: node.dataset for node in dt.subtree if node.has_data} | ||
non_empty_nodes = { | ||
path: node.dataset for path, node in dt.subtree_with_keys if node.has_data | ||
} | ||
xr.DataTree.from_dict(non_empty_nodes) | ||
|
||
You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``. | ||
|
||
(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`~xarray.DataTree.from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.) | ||
(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`~xarray.DataTree.from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.name)``.) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch! |
||
|
||
.. _manipulating trees: | ||
|
||
|
@@ -573,38 +578,85 @@ Then calculate the RMS value of these signals: | |
|
||
.. _multiple trees: | ||
|
||
We can also use the :py:meth:`~xarray.map_over_datasets` decorator to promote a function which accepts datasets into one which | ||
accepts datatrees. | ||
We can also use :py:func:`~xarray.map_over_datasets` apply a function over | ||
trees appearing in any positional argument. | ||
shoyer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Operating on Multiple Trees | ||
--------------------------- | ||
|
||
The examples so far have involved mapping functions or methods over the nodes of a single tree, | ||
but we can generalize this to mapping functions over multiple trees at once. | ||
|
||
Iterating Over Multiple Trees | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
To iterate over the corresponding nodes in multiple trees, use | ||
:py:func:`~xarray.group_subtrees` instead of | ||
:py:class:`~xarray.DataTree.subtree_with_keys`: | ||
|
||
.. ipython:: python | ||
|
||
dt1 = xr.DataTree.from_dict({"a": xr.Dataset({"x": 1}), "b": xr.Dataset({"x": 2})}) | ||
dt2 = xr.DataTree.from_dict( | ||
{"a": xr.Dataset({"x": 10}), "b": xr.Dataset({"x": 20})} | ||
) | ||
for path, (node1, node2) in xr.group_subtrees(dt1, dt2): | ||
print(path, int(node1["x"]), int(node2["x"])) | ||
shoyer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
To rebuild a tree after applying operations at each node, use | ||
:py:meth:`xarray.DataTree.from_dict()`: | ||
|
||
.. ipython:: python | ||
|
||
result = {} | ||
for path, (node1, node2) in xr.group_subtrees(dt1, dt2): | ||
result[path] = node1.dataset + node2.dataset | ||
xr.DataTree.from_dict(result) | ||
|
||
Or apply a function directly to paired datasets at every node using | ||
:py:func:`xarray.map_over_datasets`: | ||
|
||
.. ipython:: python | ||
|
||
xr.map_over_datasets(lambda x, y: x + y, dt1, dt2) | ||
|
||
Comparing Trees for Isomorphism | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
For it to make sense to map a single non-unary function over the nodes of multiple trees at once, | ||
each tree needs to have the same structure. Specifically two trees can only be considered similar, or "isomorphic", | ||
if they have the same number of nodes, and each corresponding node has the same number of children. | ||
We can check if any two trees are isomorphic using the :py:meth:`~xarray.DataTree.isomorphic` method. | ||
each tree needs to have the same structure. Specifically two trees can only be considered similar, | ||
or "isomorphic", if the full paths to all of their descendent nodes are the same. | ||
|
||
Applying :py:func:`~xarray.group_subtrees` to trees with different structure | ||
shoyer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
raises :py:class:`~xarray.TreeIsomorphismError`: | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
dt1 = xr.DataTree.from_dict({"a": None, "a/b": None}) | ||
dt2 = xr.DataTree.from_dict({"a": None}) | ||
dt1.isomorphic(dt2) | ||
tree = xr.DataTree.from_dict({"a": None, "a/b": None, "a/c": None}) | ||
simple_tree = xr.DataTree.from_dict({"a": None}) | ||
for _ in xr.group_subtrees(tree, simple_tree): | ||
... | ||
|
||
We can explicitly also check if any two trees are isomorphic using the :py:meth:`~xarray.DataTree.isomorphic` method: | ||
|
||
.. ipython:: python | ||
|
||
tree.isomorphic(simple_tree) | ||
|
||
Corresponding tree nodes do not need to have the same data in order to be considered isomorphic: | ||
|
||
.. ipython:: python | ||
|
||
tree_with_data = xr.DataTree.from_dict({"a": xr.Dataset({"foo": 1})}) | ||
simple_tree.isomorphic(tree_with_data) | ||
|
||
dt3 = xr.DataTree.from_dict({"a": None, "b": None}) | ||
dt1.isomorphic(dt3) | ||
They also do not need to define child nodes in the same order: | ||
|
||
dt4 = xr.DataTree.from_dict({"A": None, "A/B": xr.Dataset({"foo": 1})}) | ||
dt1.isomorphic(dt4) | ||
.. ipython:: python | ||
|
||
If the trees are not isomorphic a :py:class:`~xarray.TreeIsomorphismError` will be raised. | ||
Notice that corresponding tree nodes do not need to have the same name or contain the same data in order to be considered isomorphic. | ||
reordered_tree = xr.DataTree.from_dict({"a": None, "a/c": None, "a/b": None}) | ||
tree.isomorphic(reordered_tree) | ||
|
||
Arithmetic Between Multiple Trees | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have a new property:
Is there a good place we could document these properties?