Migrate datatree.py module into xarray.core. #6

owenlittlejohns · 2024-02-19T18:20:35Z

This draft PR shows initial work to migrate the datatree.py module to xarray/core/datatree.py. Once the treenode.py PR is merged, I'll likely create a new branch putting the new work here on top of the latest from the main xarray repository main branch.

Most of the changes are import path changes, and type-hints, but there are a couple of things I wanted to ask about, so I'll leave comments in the code relating to those.

completes migration step for datatree/datatree.py Track merging datatree into xarray pydata/xarray#8572
Tests added
[N/A] User visible changes (including notable bug fixes) are documented in whats-new.rst
Internal Changes (including notable bug fixes) are documented in whats-new.rst
[N/A] New functions/methods are listed in api.rst

owenlittlejohns · 2024-02-19T18:36:41Z

xarray/datatree_/datatree/__init__.py

@@ -1,15 +1,11 @@
 # import public API
-from .datatree import DataTree
-from .extensions import register_datatree_accessor


Without these changes, migrating datatree.py this early in proceedings led to circular dependencies:

xarray.core.datatree

➡️ a few things, e.g., xarary.datatree_.datatree.formatting

➡️ xarray.datatree_.datatree.__init__.py

➡️ xarray.datatree_.datatree.extensions

➡️ xarray.core.datatree.

I am a bit wary here, though, as I don't want to lose track of the things being pulled out of the public API. I'm guessing everything that was originally in here should ultimately end up in the main xarray public API? And that this is essentially the final step in the migration (ignoring future refactoring).

@TomNicholas - Maybe all I'm asking here is: should we make the addition of the datatree public API items into the xarray public API an explicit step on the main xarray issue?

I'm guessing everything that was originally in here should ultimately end up in the main xarray public API?

Yes, I don't think there is anything superfluous.

And that this is essentially the final step in the migration (ignoring future refactoring).

should we make the addition of the datatree public API items into the xarray public API an explicit step on the pydata#8572?

Yep - I'll add it to the list.

owenlittlejohns · 2024-02-19T18:38:48Z

xarray/datatree_/datatree/tests/test_extensions.py

@@ -1,6 +1,7 @@
 import pytest

-from xarray.datatree_.datatree import DataTree, register_datatree_accessor
+from xarray.core.datatree import DataTree
+from xarray.datatree_.datatree.extensions import register_datatree_accessor


This is part of the minor change to avoid circular dependencies (register_datatree_accessor is no longer part of the public datatree API).

owenlittlejohns · 2024-02-19T18:39:44Z

xarray/tests/datatree/test_datatree.py

@@ -166,8 +167,7 @@ def test_assign_when_already_child_with_variables_name(self):
            dt.ds = new_ds


-class TestGet:
-    ...
+class TestGet: ...


The stuff at the bottom here is just from me running pre-commit run --all-files.

I kind of wish I had thought of that. 🙃

Oh yeah - I regularly use that command whilst developing

owenlittlejohns · 2024-02-19T18:41:39Z

xarray/core/datatree.py

-from .ops import (
+from xarray.datatree_.datatree.formatting import datatree_repr
+from xarray.datatree_.datatree.formatting_html import (
+    datatree_repr as datatree_repr_html,


Other than the comments on the datatree public API, this is the only other change that isn't really import paths or type hints. It's incredibly minor, happy to revert back to importing the whole datatree.formatting and datatree.formatting_html modules if that is preferred.

flamingbear · 2024-02-20T20:44:10Z

doc/whats-new.rst

@@ -108,6 +108,10 @@ Internal Changes
  `Matt Savoie <https://github.com/flamingbear>`_ and `Tom Nicholas
  <https://github.com/TomNicholas>`_.

+- Migrates ``datatree`` functionality into ``xarray/core``. (:pull: `8757`)


you won't know this actual PR value until you open against pydata/xarray (draft or real)

Oh yeah - good catch!

TomNicholas · 2024-02-27T20:00:28Z

FYI the treenode.py module was merged now, so you should be able to open this against xarray main

owenlittlejohns · 2024-02-27T20:09:50Z

Thanks for the heads up @TomNicholas - I will get a PR up against the latest HEAD of the main xarray repository today. (I might clean up the commit history a bit and cherry pick the commits on a fresh branch, but will get that squared away ASAP)

flamingbear

Just submitting so you can see what I found. No approval or not since you're going to put it up to xarray/main

xarray/core/datatree.py

flamingbear · 2024-02-27T20:37:58Z

xarray/core/datatree.py

@@ -1027,7 +1021,7 @@ def drop_nodes(
            if extra:
                raise KeyError(f"Cannot drop all nodes - nodes {extra} not present")

-        children_to_keep = OrderedDict(
+        children_to_keep = dict(


Separately on this line, you can get rid of the dict wrapper and just use

children_to_keep = {name: child for name, child in self.children.items() if name not in names}

flamingbear · 2024-02-27T20:44:15Z

xarray/core/datatree.py

Another issue.

https://github.com/flamingbear/xarray/blob/DAS-2062-migrate-datatree-module/xarray/core/datatree.py#L1447

This method as_array, I don't think exists on Dataset or DatasetView.

def as_array(self) -> DataArray: return self.ds.as_dataarray()

I can follow it back to Dataset, but I think that's a numpy method. My guess is that it should be self.ds.to_dataarray() but that's a guess.

Dataset used to have a to_array method but it got renamed to the clearer to_dataarray sometime after I wrote to_array into datatree.

Okay, given that the underlying method in Dataset has changed, would the recommendation here be to rename this method to to_dataarray, to provide consistency between the classes?

Yes definitely. I don't think we need to be particularly concerned with exact backwards compatibility with a prototype, though it might be nice to keep a record of any small "breaking changes" like this so we can include it when we make xr.DataTree public.

flamingbear · 2024-02-27T20:49:34Z

xarray/core/datatree.py

Another issue, you didn't cause.

https://github.com/flamingbear/xarray/blob/DAS-2062-migrate-datatree-module/xarray/core/datatree.py#L1508

This doc string looks like it was pulled from another location (Dataset and Dataarray's to_zarr) and then formatted badly. The issue is that In this case, the mode is given a default value of "w-" so that unless None is specified, the following remarks are incorrect.
I would probably update this below.

The default mode is “a” if append_dim is set. Otherwise, it is “r+” if region is set and w- otherwise.

to be

The default mode is “w-".

The to_zarr in dataset/dataarray does handle the None case*, but this currently does not. in create empty zarr group it just uses the mode passed in and there no logic to determine the correct type.

actually I thought that was in the dataset/dataarray, but it's in the api and now I'm second guessing myself, maybe that should be fixed when _datatree_to_zarr is migrated from io.py

@owenlittlejohns I think something here needs updating.

owenlittlejohns · 2024-02-27T22:29:10Z

Here's the real PR for this: pydata#8789!

owenlittlejohns commented Feb 19, 2024

View reviewed changes

owenlittlejohns requested a review from flamingbear February 19, 2024 18:42

flamingbear reviewed Feb 20, 2024

View reviewed changes

owenlittlejohns force-pushed the DAS-2062-migrate-datatree-module branch from 1ef4006 to 025db6d Compare February 21, 2024 22:23

Migrate datatree.py module into xarray.core.

6f5af84

owenlittlejohns force-pushed the DAS-2062-migrate-datatree-module branch from 025db6d to 6f5af84 Compare February 27, 2024 15:41

flamingbear reviewed Feb 27, 2024

View reviewed changes

owenlittlejohns mentioned this pull request Feb 27, 2024

Migrate datatree.py module into xarray.core. pydata/xarray#8789

Merged

4 tasks

flamingbear deleted the branch mhs/migrate_treenode February 28, 2024 22:02

flamingbear closed this Feb 28, 2024

owenlittlejohns deleted the DAS-2062-migrate-datatree-module branch March 26, 2024 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate datatree.py module into xarray.core. #6

Migrate datatree.py module into xarray.core. #6

owenlittlejohns commented Feb 19, 2024 •

edited

Loading

owenlittlejohns Feb 19, 2024 •

edited

Loading

TomNicholas Feb 19, 2024

owenlittlejohns Feb 19, 2024

owenlittlejohns Feb 19, 2024

flamingbear Feb 20, 2024

TomNicholas Feb 20, 2024

owenlittlejohns Feb 19, 2024

flamingbear Feb 20, 2024

owenlittlejohns Feb 20, 2024

TomNicholas commented Feb 27, 2024

owenlittlejohns commented Feb 27, 2024

flamingbear left a comment

flamingbear Feb 27, 2024

flamingbear Feb 27, 2024

TomNicholas Feb 27, 2024

owenlittlejohns Feb 27, 2024 •

edited

Loading

TomNicholas Feb 27, 2024 •

edited

Loading

flamingbear Feb 27, 2024

flamingbear Feb 28, 2024

owenlittlejohns commented Feb 27, 2024

Migrate datatree.py module into xarray.core. #6

Migrate datatree.py module into xarray.core. #6

Conversation

owenlittlejohns commented Feb 19, 2024 • edited Loading

owenlittlejohns Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomNicholas commented Feb 27, 2024

owenlittlejohns commented Feb 27, 2024

flamingbear left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owenlittlejohns Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

TomNicholas Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owenlittlejohns commented Feb 27, 2024

owenlittlejohns commented Feb 19, 2024 •

edited

Loading

owenlittlejohns Feb 19, 2024 •

edited

Loading

owenlittlejohns Feb 27, 2024 •

edited

Loading

TomNicholas Feb 27, 2024 •

edited

Loading