Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTree.from_dict fails to insert parent if parent/child inserted first. #9276

Closed
5 tasks done
flamingbear opened this issue Jul 24, 2024 · 1 comment · Fixed by #9292
Closed
5 tasks done

DataTree.from_dict fails to insert parent if parent/child inserted first. #9276

flamingbear opened this issue Jul 24, 2024 · 1 comment · Fixed by #9292
Labels
bug needs triage Issue that has not been reviewed by xarray team member topic-DataTree Related to the implementation of a DataTree class

Comments

@flamingbear
Copy link
Member

flamingbear commented Jul 24, 2024

What happened?

DataTree.from_dict() fails to initialize if a parent/child node is inserted before a parent node

lineage = DataTree.from_dict(
    {
        "/": xr.Dataset({"age": 83}),
        "/Homer": xr.Dataset({"age": 39}),
        "/Homer/Bart": xr.Dataset({"age": 10}),
    }
)
print(lineage)

works as expected:

<xarray.DataTree>
Group: /
│   Dimensions:  ()
│   Data variables:
│       age      int64 8B 83
└── Group: /Homer
    │   Dimensions:  ()
    │   Data variables:
    │       age      int64 8B 39
    └── Group: /Homer/Bart
            Dimensions:  ()
            Data variables:
                age      int64 8B 10

If you reverse the dictionary

DataTree.from_dict({
    "/Homer/Bart": xr.Dataset({"age": 10}),        
    "/Homer": xr.Dataset({"age": 39}),
    "/": xr.Dataset({"age": 83}),
})

It fails:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[40], line 1
----> 1 DataTree.from_dict({
      2     "/Homer/Bart": xr.Dataset({"age": 10}),        
      3     "/Homer": xr.Dataset({"age": 39}),
      4     "/": xr.Dataset({"age": 83}),
      5 })

File ~/projects/data-services/contrib/xarray/xarray/core/datatree.py:1115, in DataTree.from_dict(cls, d, name)
   1113         else:
   1114             new_node = cls(name=node_name, data=data)
-> 1115         obj._set_item(
   1116             path,
   1117             new_node,
   1118             allow_overwrite=False,
   1119             new_nodes_along_path=True,
   1120         )
   1122 return obj

File ~/projects/data-services/contrib/xarray/xarray/core/treenode.py:552, in TreeNode._set_item(self, path, item, new_nodes_along_path, allow_overwrite)
    550         current_node._set(name, item)
    551     else:
--> 552         raise KeyError(f"Already a node object at path {path}")
    553 else:
    554     current_node._set(name, item)

KeyError: 'Already a node object at path /Homer'

This might be easy, but unrelated to fixing the docs so I'm filing a bug report.

What did you expect to happen?

I expect that the dictionary order should not affect the DataTree creation.

Minimal Complete Verifiable Example

from xarray.core.datatree import DataTree

DataTree.from_dict({
    "/Homer/Bart": xr.Dataset({"age": 10}),        
    "/Homer": xr.Dataset({"age": 39}),
    "/": xr.Dataset({"age": 83}),
})

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[40], line 1
----> 1 DataTree.from_dict({
      2     "/Homer/Bart": xr.Dataset({"age": 10}),        
      3     "/Homer": xr.Dataset({"age": 39}),
      4     "/": xr.Dataset({"age": 83}),
      5 })

File ~/projects/data-services/contrib/xarray/xarray/core/datatree.py:1115, in DataTree.from_dict(cls, d, name)
   1113         else:
   1114             new_node = cls(name=node_name, data=data)
-> 1115         obj._set_item(
   1116             path,
   1117             new_node,
   1118             allow_overwrite=False,
   1119             new_nodes_along_path=True,
   1120         )
   1122 return obj

File ~/projects/data-services/contrib/xarray/xarray/core/treenode.py:552, in TreeNode._set_item(self, path, item, new_nodes_along_path, allow_overwrite)
    550         current_node._set(name, item)
    551     else:
--> 552         raise KeyError(f"Already a node object at path {path}")
    553 else:
    554     current_node._set(name, item)

KeyError: 'Already a node object at path /Homer'

Anything else we need to know?

No response

Environment

/Users/savoie/.pyenv/versions/miniconda3-4.7.12/envs/xarray-tests/lib/python3.11/site-packages/_distutils_hack/init.py:26: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: 068bab2
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.6.1.dev88+g068bab28b.d20240724
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.0
netCDF4: 1.6.5
pydap: installed
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.17.2
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: 3.9.0
bottleneck: 1.3.8
dask: 2024.4.2
distributed: 2024.4.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.3.1
cupy: None
pint: 0.23
sparse: 0.15.1
flox: 0.9.6
numpy_groupies: 0.10.2
setuptools: 69.5.1
pip: 24.0
conda: None
pytest: 8.1.1
mypy: 1.8.0
IPython: 8.25.0
sphinx: None

@flamingbear flamingbear added bug needs triage Issue that has not been reviewed by xarray team member topic-DataTree Related to the implementation of a DataTree class labels Jul 24, 2024
@TomNicholas
Copy link
Member

Thanks for raising this @flamingbear !

Looks like this bug has been in there since the very start!

It's caused by how the internal ._set_item() has to walk out to leaf nodes, creating new intermediate nodes in its path (and potentially overwriting them if they already exist).

It's possible this kind of behaviour should be more closely guarded against in ._set_item, but there is an easy fix for this bug: just sort the dict by key first so you are always inserting parents before children (see #9292).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants