Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine DataTree.equals and DataTree.identical #9473

Closed
wants to merge 3 commits into from

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Sep 10, 2024

.identical() now checks for inherited coordinates being defined at the same level ("identical on disk"), whereas .equals now checks only that inherited coordinates are equal, even if defined redundantly ("good enough for xarray's alignment").

I've also removed the from_root argument from identical and equals, though we could bring it back if @TomNicholas feels strongly. It felt like a weird thing to check -- if you want to check from the root, why not just directly compare the root objects instead?

We should probably also check that .path matches, in the case of .identical only. Currently this is also checked for .equals, which I would propose we remove.

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Sep 10, 2024
other: DataTree,
nodes_match: Callable[[DataTree, DataTree], bool],
) -> bool:
if not self.isomorphic(other, from_root=True, strict_names=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there is a problem with removing the from_root argument but keeping it fixed to True inside check_isomorphic...

Let's imagines I assert_equal two subtrees that have different parents. All the nodes in the subtrees could match, but if the structures above the node where the subtree starts are not isomorphic, it will return False. That doesn't seem like desired behaviour to me - it leaves no way to check that the two subtrees are equal.

Comment on lines +1293 to +1294
# include inherited coordinates only at the root; otherwise require that
# everything is also defined at the same level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this for identical but not for equals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you did talk about this in your comment above


See Also
--------
Dataset.equals
DataTree.isomorphic
DataTree.identical
"""
if not self.isomorphic(other, from_root=from_root, strict_names=True):
to_ds = lambda x: x._to_dataset_view(inherited=True, rebuild_dims=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler (and clearer) to use inherited=False? The nodes are checked in depth-first order anyway, so if there is a discrepancy between inherited coordinates it will be discovered at the top of the subtree before checking the children.

@TomNicholas
Copy link
Member

It felt like a weird thing to check -- if you want to check from the root, why not just directly compare the root objects instead?

I feel like maybe there was some reason why I needed this, in map_over_subtree maybe?

@TomNicholas
Copy link
Member

I feel like maybe there was some reason why I needed this, in map_over_subtree maybe?

map_over_subtree only uses from_root=False, so I guess we can get rid of this argument 🤷

first_tree, other_tree, require_names_equal=False, check_from_root=False

@TomNicholas
Copy link
Member

Closing this based on the comment in #9215 (comment). Feel free to re-open if you think we do want to keep some part of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataTree.identical should consider checking that coordinates are defined at the same level
2 participants