-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement DataTree aggregations #9589
Conversation
They now allow for dimensions that are missing on particular nodes, and use Xarray's standard generate_aggregations machinery, like aggregations for DataArray and Dataset. Fixes pydata#8949, pydata#8963
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for node in self.subtree: | ||
reduce_dims = [d for d in node._node_dims if d in dims] | ||
node_result = node.dataset.reduce( | ||
func, | ||
reduce_dims, | ||
keep_attrs=keep_attrs, | ||
keepdims=keepdims, | ||
numeric_only=numeric_only, | ||
**kwargs, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the concern in #9588 (comment) not apply here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#9588 has special logic for handling cases where coordinates that formerly had an index are reduced to scalars. That can't happen for aggregation.
xarray/core/utils.py
Outdated
@@ -830,6 +830,26 @@ def drop_dims_from_indexers( | |||
) | |||
|
|||
|
|||
def dim_arg_to_dims_set(dim: Dims, all_dims: Collection) -> set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is the difference to parse_dims
(except that it returns a tuple, which is hardly a reason to add a new method)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! I switched to use parse_dims
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it seems a bit more efficient to keep things as sets. For now I've added parse_dims_as_set
, which looks like a slightly better fit for the one other use of parse_dims
that I could find.
Were there other intended uses for parse_dims
and parse_ordered_dims
? I was surprised to only find one use of parse_dims
inside Xarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to ultimately use this in all methods that expect one or more dims.
But given that every method somehow handles multiple dims differently nobody was brave enough to change that because it is a somewhat breaking change.
@@ -1607,3 +1616,35 @@ def to_zarr( | |||
compute=compute, | |||
**kwargs, | |||
) | |||
|
|||
def _get_all_dims(self) -> set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds quite useful. Maybe this should be exposed as a public API (maybe a property)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to someone adding this later!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit, that I never used DataTree... But what exactly does DataTree.dims
return and how is it different to this?
Edit: is it full tree vs subtree dims?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataTree.dims returns all dimensions defined at the base level of the tree. This method also returns dimensions defined on descendant nodes.
daead56
to
830f797
Compare
for more information, see https://pre-commit.ci
@shoyer FYI the test failures are real - |
Would be great if you have time to look into this soon! Otherwise I will in
a day or two.
…On Fri, Oct 11, 2024 at 11:52 AM Tom Nicholas ***@***.***> wrote:
@shoyer <https://github.com/shoyer> FYI the test failures are real -
parse_dims_as_set apparently breaks some expected error messages and also
causes a typing error. I'm happy to finish this off if you're busy?
—
Reply to this email directly, view it on GitHub
<#9589 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVX7PA4UCZVIWYGDQILZ244QRAVCNFSM6AAAAABPPYKPDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBWGQ2DQMRQGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I've fixed the typing errors and the test failures. I will merge this tomorrow unless anyone has any objections to my fixes. |
Looks great, thanks Tom! |
They now allow for dimensions that are missing on particular nodes, and use Xarray's standard generate_aggregations machinery, like aggregations for DataArray and Dataset.
api.rst