-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate datatree.py module into xarray.core. #6
Migrate datatree.py module into xarray.core. #6
Conversation
@@ -1,15 +1,11 @@ | |||
# import public API | |||
from .datatree import DataTree | |||
from .extensions import register_datatree_accessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without these changes, migrating datatree.py
this early in proceedings led to circular dependencies:
xarray.core.datatree
- ➡️ a few things, e.g.,
xarary.datatree_.datatree.formatting
- ➡️
xarray.datatree_.datatree.__init__.py
- ➡️
xarray.datatree_.datatree.extensions
- ➡️
xarray.core.datatree
.
I am a bit wary here, though, as I don't want to lose track of the things being pulled out of the public API. I'm guessing everything that was originally in here should ultimately end up in the main xarray public API? And that this is essentially the final step in the migration (ignoring future refactoring).
@TomNicholas - Maybe all I'm asking here is: should we make the addition of the datatree
public API items into the xarray
public API an explicit step on the main xarray issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing everything that was originally in here should ultimately end up in the main xarray public API?
Yes, I don't think there is anything superfluous.
And that this is essentially the final step in the migration (ignoring future refactoring).
should we make the addition of the datatree public API items into the xarray public API an explicit step on the pydata#8572?
Yep - I'll add it to the list.
@@ -1,6 +1,7 @@ | |||
import pytest | |||
|
|||
from xarray.datatree_.datatree import DataTree, register_datatree_accessor | |||
from xarray.core.datatree import DataTree | |||
from xarray.datatree_.datatree.extensions import register_datatree_accessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the minor change to avoid circular dependencies (register_datatree_accessor
is no longer part of the public datatree API).
@@ -166,8 +167,7 @@ def test_assign_when_already_child_with_variables_name(self): | |||
dt.ds = new_ds | |||
|
|||
|
|||
class TestGet: | |||
... | |||
class TestGet: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stuff at the bottom here is just from me running pre-commit run --all-files
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of wish I had thought of that. 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah - I regularly use that command whilst developing
from .ops import ( | ||
from xarray.datatree_.datatree.formatting import datatree_repr | ||
from xarray.datatree_.datatree.formatting_html import ( | ||
datatree_repr as datatree_repr_html, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the comments on the datatree
public API, this is the only other change that isn't really import paths or type hints. It's incredibly minor, happy to revert back to importing the whole datatree.formatting
and datatree.formatting_html
modules if that is preferred.
@@ -108,6 +108,10 @@ Internal Changes | |||
`Matt Savoie <https://github.com/flamingbear>`_ and `Tom Nicholas | |||
<https://github.com/TomNicholas>`_. | |||
|
|||
- Migrates ``datatree`` functionality into ``xarray/core``. (:pull: `8757`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you won't know this actual PR value until you open against pydata/xarray (draft or real)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah - good catch!
1ef4006
to
025db6d
Compare
025db6d
to
6f5af84
Compare
FYI the |
Thanks for the heads up @TomNicholas - I will get a PR up against the latest HEAD of the main xarray repository today. (I might clean up the commit history a bit and cherry pick the commits on a fresh branch, but will get that squared away ASAP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just submitting so you can see what I found. No approval or not since you're going to put it up to xarray/main
@@ -1027,7 +1021,7 @@ def drop_nodes( | |||
if extra: | |||
raise KeyError(f"Cannot drop all nodes - nodes {extra} not present") | |||
|
|||
children_to_keep = OrderedDict( | |||
children_to_keep = dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately on this line, you can get rid of the dict wrapper and just use
children_to_keep = {name: child for name, child in self.children.items() if name not in names}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue.
This method as_array, I don't think exists on Dataset or DatasetView.
def as_array(self) -> DataArray:
return self.ds.as_dataarray()
I can follow it back to Dataset, but I think that's a numpy method. My guess is that it should be self.ds.to_dataarray()
but that's a guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dataset
used to have a to_array
method but it got renamed to the clearer to_dataarray
sometime after I wrote to_array
into datatree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, given that the underlying method in Dataset
has changed, would the recommendation here be to rename this method to to_dataarray
, to provide consistency between the classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely. I don't think we need to be particularly concerned with exact backwards compatibility with a prototype, though it might be nice to keep a record of any small "breaking changes" like this so we can include it when we make xr.DataTree
public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue, you didn't cause.
This doc string looks like it was pulled from another location (Dataset and Dataarray's to_zarr) and then formatted badly. The issue is that In this case, the mode is given a default value of "w-" so that unless None is specified, the following remarks are incorrect.
I would probably update this below.
The default mode is “a” if append_dim is set. Otherwise, it is “r+” if region is set and w- otherwise.
to be
The default mode is “w-".
The to_zarr in dataset/dataarray does handle the None case*, but this currently does not. in create empty zarr group it just uses the mode passed in and there no logic to determine the correct type.
- actually I thought that was in the dataset/dataarray, but it's in the api and now I'm second guessing myself, maybe that should be fixed when _datatree_to_zarr is migrated from io.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@owenlittlejohns I think something here needs updating.
Here's the real PR for this: pydata#8789! |
This draft PR shows initial work to migrate the
datatree.py
module toxarray/core/datatree.py
. Once the treenode.py PR is merged, I'll likely create a new branch putting the new work here on top of the latest from the mainxarray
repositorymain
branch.Most of the changes are import path changes, and type-hints, but there are a couple of things I wanted to ask about, so I'll leave comments in the code relating to those.
datatree/datatree.py
Track merging datatree into xarray pydata/xarray#8572Internal Changes (including notable bug fixes) are documented in whats-new.rst