-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate datatreee assertions/extensions/formatting #8967
Migrate datatreee assertions/extensions/formatting #8967
Conversation
@@ -71,37 +71,6 @@ def check_isomorphic( | |||
raise TreeIsomorphismError("DataTree objects are not isomorphic:\n" + diff) | |||
|
|||
|
|||
def diff_treestructure(a: DataTree, b: DataTree, require_names_equal: bool) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this into xarray/core/formatting.py
to avoid a circular dependency issue.
`node` | ||
:any:`NodeMixin` object. | ||
It is up to the user to assemble these parts to a whole. | ||
>>> from xarray import Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples in this documentation string are a bit shorter than the originals from anytree
. That's because using node.children.values()
gets a ValuesView
which isn't compatible with iterables like reversed
that alter the order of the items in the iterable.
@@ -1,6 +1,6 @@ | |||
import textwrap | |||
|
|||
from xarray import Dataset | |||
from xarray.core.dataset import Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was causing another circular dependency issue. @flamingbear - just FYI, for when you are tweaking ops.py
.
@@ -1,3 +1,4 @@ | |||
# TODO: Add assert_isomorphic when making DataTree API public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed we didn't want to surface assert_isomorphic
until everything else was public. Does that sound good to you @TomNicholas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there was an issue collecting things we need to do to put a final bow on things, but I'm not finding it. Should we add it to #8572? or is that overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a dedicated issue for that either. Yes lets' just make an explicit list under Expose datatree API publicly. on #8572 (I'll do that now)
xarray/testing/assertions.py
Outdated
@@ -81,12 +134,19 @@ def assert_equal(a, b): | |||
assert a.equals(b), formatting.diff_dataset_repr(a, b, "equals") | |||
elif isinstance(a, Coordinates): | |||
assert a.equals(b), formatting.diff_coords_repr(a, b, "equals") | |||
elif isinstance(a, DataTree): | |||
from_root = kwargs.get("from_root", True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would people be more comfortable if I was doing some type checking here for from_root
being a bool
?
@@ -5,9 +5,14 @@ | |||
import pytest | |||
|
|||
import xarray as xr | |||
|
|||
# TODO: Remove imports in favour of xr.DataTree etc, once part of public API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also hoping that once we can use xr.DataTree
that all the type annotations that mypy
has insisted on in the tests might be able to be removed (e.g., dt: DataTree = DataTree()
)
assert_identical(elders, expected) | ||
|
||
|
||
class TestDSMethodInheritance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This felt like a good place for adding these tests. But I could move them to another module if that's a preference (just probably not under the name test_dataset_api.py
, which might be confusing outside of a specifically datatree
oriented context).
xarray/testing/assertions.py
Outdated
""" | ||
Two DataTrees are considered isomorphic if every node has the same number of children. | ||
|
||
Nothing about the data in each node is checked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing about the data in each node is checked. | |
Nothing about the data or attrs in each node is checked. |
xarray/testing/assertions.py
Outdated
Arrays with NaN in the same location are considered equal. DataTrees are | ||
equal if they have isomorphic node structures, with matching node names, | ||
and if they have matching variables and coordinates, all of which are | ||
equal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition here of equal/allclose/identical
is "isomorphic, and also every set of corresponding nodes in the trees are equal/allclose/identical". It would be good to reword this to indicate that assert_equal
for trees is essentially just assert_equal
for Dataset
but mapped over the trees.
xarray/testing/assertions.py
Outdated
**kwargs : dict | ||
Additional keyword arguments passed on to DataTree.equals. Specifically, | ||
from_root, which defaults True, and will check the whole tree above | ||
the given node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why assert_equal
has **kwargs
at all - they don't seem to be used?
assert_allclose
accepts specific named tolerance kwargs, shouldn't we just add from_root
explicitly to all of these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have misunderstood the conversation in our tag-up today, but it sounded like **kwargs
was a potentially preferred option to explicitly stating from_root
, where from_root
is only used by assert_equal
or assert_identical
when dealing with a DataTree
. But happy to explicitly name from_root
as an optional argument if that seems better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgetting about from_root
for a moment, I just don't understand why xarray's existing assert_equal
method already accepts **kwargs
then does nothing with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow here - I added **kwargs
to both assert_equal
and assert_identical
in this PR. Then inside the condition for isinstance(a, DataTree)
I try to grab from_root
from those kwargs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I'm being stupid, ignore the comment about it already being there (it wasn't).
Why are you adding **kwargs
though? Did we decide to do that in the meeting? I thought all we decided was that assert_*
would need a from_root
arg, even if that didn't do anything for non-DataTree
inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been suggesting that... the idea was that having it for non-DataTree
objects might be confusing, but I guess **kwargs
makes tab-completion less useful (but other than that there should not be any difference).
Edit: see the meeting notes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my bad for somehow missing that suggestion initially.
Still, I feel like hiding a kwarg away is more confusing than just showing it and saying it is only used if two DataTree
objects are passed... For example apply_ufunc
has arguments that are only used if the input is of a certain type.
Another thing we could do here is use typing.overloads
to ensure that passing from_root
when a
and b
are not both DataTree
s would fail static type checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I was just thinking about using typing.overloads
- I'll have a go at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
@overload | ||
def assert_equal(a, b): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried to specify all the individual overloads (e.g., assert_equal(a: Dataset, b: Dataset)
, etc). That led to issues with the return values from DataTree.__getitem__
, which are: DataTree | DataArray.
Hopefully this hits enough of the spot, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the discussion and the resulting code, looks good to me now. Thanks.
xarray/core/formatting.py
Outdated
f"Left and right {type(a).__name__} objects are not {_compat_to_str(compat)}" | ||
] | ||
|
||
# TODO check root parents? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO check root parents? |
This should be covered by the parent assertions having a from_root
kwarg.
xarray/tests/test_formatting.py
Outdated
def test_print_datatree(self, simple_datatree): | ||
dt = simple_datatree | ||
print(dt) | ||
|
||
# TODO work out how to test something complex like this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a placeholder for something better. All this does right now is check an error is raised. But we might not need more tests than the ones already in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pulled this comment, but let me know if you actually wanted me to pull any of the tests, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just pull this test, as we are already testing that the __str__
method doesn't error in other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of very small comments about removing outdated comments / tests, after that I will merge!
This PR continues the overall work of migrating DataTree into xarray.
xarray/core/datatree_render.py
is the renamed version ofxarray/datatree_/datatree/render.py
.xarray/core/extensions.py
now contains functionality fromxarray/datatree_/datatree/extensions.py
.xarray/core/formatting.py
now contains functionality fromxarray/datatree_/datatree/formatting.py
.xarray/tests/test_datatree.py
now contains tests fromxarray/datatree_/datatree/tests/test_dataset_api.py
.xarray/testing/assertions.py
now contains functionality from/xarray/datatree_/datatree/testing.py
.I had also meant to get to
common.py
and what's left ofio.py
, but I've got a hefty couple of days of meetings ahead, so I wanted to get this progress into PR before that happens. @flamingbear or I can follow up with the remaining things in a separate PR. (Also this PR is already getting a little big, so maybe it's already got enough in it)Tests addedUser visible changes (including notable bug fixes) are documented inwhats-new.rst
New functions/methods are listed inapi.rst