Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a DataTree should not modify parent and children in-place #9196

Closed
shoyer opened this issue Jul 1, 2024 · 5 comments
Closed

Creating a DataTree should not modify parent and children in-place #9196

shoyer opened this issue Jul 1, 2024 · 5 comments
Labels
API design topic-DataTree Related to the implementation of a DataTree class

Comments

@shoyer
Copy link
Member

shoyer commented Jul 1, 2024

What is your issue?

This violates the well-established Python convention that functions either have a return value or modify arguments in-place. They never do both.

Examples:

  • Modifying a parent:
from xarray.core.datatree import DataTree

root = DataTree()
child = DataTree(name='child', parent=root)
print(root)
# <xarray.DataTree>
# Group: /
# └── Group: /child
  • Modifying children:
from xarray.core.datatree import DataTree

child = DataTree()
root = DataTree(children={'child': child})
print(child)
# <xarray.DataTree 'child'>
# Group: /child

This particularly surprising if a DataTree argument is reused, e.g.,

from xarray.core.datatree import DataTree

child = DataTree()
root = DataTree(children={'child': child})
root2 = DataTree(children={'child2': child})
print(child)  # attached to root2
# <xarray.DataTree 'child2'>
# Group: /child2
print(root)  # now empty!
# <xarray.DataTree>
# Group: /

Here's my suggestion:

  1. We should make the DataTree constructor make shallow copies of its DataTree arguments.
  2. We should consider getting rid of the parent constructor argument. It is redundant with children, and unlike children, parent nodes don't show up in the DataTree repr, so it's more surprising to see them copied.

CC @TomNicholas

@shoyer shoyer added API design topic-DataTree Related to the implementation of a DataTree class labels Jul 1, 2024
@TomNicholas
Copy link
Member

Good point, this wasn't really an intentional breaking of that convention on my part.

Happy with either / both of your suggestions.

I think I originally put parent in the constructor just so that all attributes were present in the signature, but there are multiple other ways to set the parent once the subtree has been created.

@TomNicholas
Copy link
Member

To be clear the aim here is to do both of @shoyer 's suggestions, probably in 2 separate PRs

@flamingbear
Copy link
Member

I'm just finding this issue now. I Don't know why I didn't see it. I assume, this is going to make a bunch of changes required in the docs. Can you summarize what I'll need to look for?
e.g I assume this isn't true any longer.

homer = xr.DataTree(name="Homer", children={"Bart": bart, "Lisa": lisa})
The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the [siblings](https://github.com/pydata/xarray/generated/xarray.DataTree.siblings.html#xarray.DataTree.siblings) property:

list(bart.siblings)
Out[5]: ['Lisa']

That assignment to homer will no longer make bart's sibling lisa?
What am I missing?

@shoyer
Copy link
Member Author

shoyer commented Jul 31, 2024

e.g I assume this isn't true any longer.

homer = xr.DataTree(name="Homer", children={"Bart": bart, "Lisa": lisa})
The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the [siblings](https://github.com/pydata/xarray/generated/xarray.DataTree.siblings.html#xarray.DataTree.siblings) property:

list(bart.siblings)
Out[5]: ['Lisa']

That assignment to homer will no longer make bart's sibling lisa? What am I missing?

Yes, you have this correct.

homer.children['Bart'] will have the sibling Lisa, but not the original bart object.

@TomNicholas
Copy link
Member

This should be closed by #9297.

TomNicholas added a commit to TomNicholas/xarray that referenced this issue Sep 11, 2024
TomNicholas added a commit to TomNicholas/xarray that referenced this issue Sep 11, 2024
TomNicholas added a commit that referenced this issue Sep 12, 2024
* test from #9196 but on TreeNode

* move assignment and copying of children to TreeNode constructor

* move copy methods over to TreeNode

* change copying behaviour to be in line with #9196

* explicitly test that ._copy_subtree works for TreeNode

* reimplement ._copy_subtree using recursion

* change treenode.py tests to match expected non-in-place behaviour

* fix but created in DataTree.__init__

* add type hints for Generic TreeNode back in

* update typing of ._copy_node

* remove redunant setting of _name
hollymandel pushed a commit to hollymandel/xarray that referenced this issue Sep 23, 2024
* test from pydata#9196 but on TreeNode

* move assignment and copying of children to TreeNode constructor

* move copy methods over to TreeNode

* change copying behaviour to be in line with pydata#9196

* explicitly test that ._copy_subtree works for TreeNode

* reimplement ._copy_subtree using recursion

* change treenode.py tests to match expected non-in-place behaviour

* fix but created in DataTree.__init__

* add type hints for Generic TreeNode back in

* update typing of ._copy_node

* remove redunant setting of _name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

No branches or pull requests

3 participants