Re-implement map_over_datasets #2

* test unary op * implement and generate unary ops * test for unary op with inherited coordinates * re-enable arithmetic tests * implementation for binary ops * test ds * dt commutativity * ensure other types defer to DataTree, thus fixing pydata#9365 * test for inplace binary op * pseudocode implementation of inplace binary op, and xfail test * remove some unneeded type: ignore comments * return type should be DataTree * type datatree ops as accepting dataset-compatible types too * use same type hinting hack as Dataset does for __eq__ not being same as Mapping * ignore return type * add some methods to api docs * don't try to import DataTree.astype in API docs * test to check that single-node trees aren't broadcast * return NotImplemented * remove pseudocode for inplace binary ops * map_over_subtree -> map_over_datasets

* sketch of migration guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * whatsnew * add date * spell out API changes in more detail * details on backends integration * explain alignment and open_groups * explain coordinate inheritance * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * re-trigger CI * remove bullet about map_over_subtree * Markdown formatting for important warning block Co-authored-by: Matt Savoie <github@flamingbear.com> * Reorder changes in order of importance Co-authored-by: Matt Savoie <github@flamingbear.com> * Clearer wording on setting relationships Co-authored-by: Matt Savoie <github@flamingbear.com> * remove "technically" Co-authored-by: Matt Savoie <github@flamingbear.com> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Matt Savoie <github@flamingbear.com>

The main changes: - It is implemented using zip_subtrees, which means it should properly handle DataTrees where the nodes are defined in a different order. - For simplicity, I removed handling of `**kwargs`, in order to preserve some flexibility for adding keyword arugments. - I removed automatic skipping of empty nodes, because there are almost assuredly cases where that would make sense. This could be restored with a option keyword arugment.

As mentioned in pydata#2157, the docstring of `Dataset.groupby` does not reflect deprecation of squeeze (as the docstring of `DataArray.groupby` does) and states an incorrect default value.

* Add inherit=False option to DataTree.copy() This PR adds a inherit=False option to DataTree.copy, so users can decide if they want to inherit coordinates from parents or not when creating a subtree. The default behavior is `inherit=True`, which is a breaking change from the current behavior where parent coordinates are dropped (which I believe should be considered a bug). * fix typing * add migration guide note * ignore typing error

* Bug fixes for DataTree indexing and aggregation My implementation of indexing and aggregation was incorrect on child nodes, re-creating the child nodes from the root. There was also another bug when indexing inherited coordinates that meant formerly inherited coordinates were incorrectly dropped from results. * disable broken test

@headtr1ck

as suggested by @headtr1ck in pydata#9628 (comment)

* type hints for datatree ops tests * type hints for datatree aggregations tests * type hints for datatree indexing tests * type hint a lot more tests * more type hints

* Add zip_subtrees for paired iteration over DataTrees This should be used for implementing DataTree arithmetic inside map_over_datasets, so the result does not depend on the order in which child nodes are defined. I have also added a minimal implementation of breadth-first-search with an explicit queue the current recursion based solution in xarray.core.iterators (which has been removed). The new implementation is also slightly faster in my microbenchmark: In [1]: import xarray as xr In [2]: tree = xr.DataTree.from_dict({f"/x{i}": None for i in range(100)}) In [3]: %timeit _ = list(tree.subtree) # on main 87.2 μs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) # with this branch 55.1 μs ± 294 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) * fix pytype error * Tweaks per review

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-implement map_over_datasets #2

Re-implement map_over_datasets #2

Commits on Oct 15, 2024

Commits on Oct 16, 2024