Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement map_over_datasets #2

Closed
wants to merge 10 commits into from
Closed

Commits on Oct 15, 2024

  1. Reimplement Datatree typed ops (pydata#9619)

    * test unary op
    
    * implement and generate unary ops
    
    * test for unary op with inherited coordinates
    
    * re-enable arithmetic tests
    
    * implementation for binary ops
    
    * test ds * dt commutativity
    
    * ensure other types defer to DataTree, thus fixing pydata#9365
    
    * test for inplace binary op
    
    * pseudocode implementation of inplace binary op, and xfail test
    
    * remove some unneeded type: ignore comments
    
    * return type should be DataTree
    
    * type datatree ops as accepting dataset-compatible types too
    
    * use same type hinting hack as Dataset does for __eq__ not being same as Mapping
    
    * ignore return type
    
    * add some methods to api docs
    
    * don't try to import DataTree.astype in API docs
    
    * test to check that single-node trees aren't broadcast
    
    * return NotImplemented
    
    * remove pseudocode for inplace binary ops
    
    * map_over_subtree -> map_over_datasets
    TomNicholas authored Oct 15, 2024
    Configuration menu
    Copy the full SHA
    97ec434 View commit details
    Browse the repository at this point in the history
  2. Migration guide for users of old datatree repo (pydata#9598)

    * sketch of migration guide
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * whatsnew
    
    * add date
    
    * spell out API changes in more detail
    
    * details on backends integration
    
    * explain alignment and open_groups
    
    * explain coordinate inheritance
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * re-trigger CI
    
    * remove bullet about map_over_subtree
    
    * Markdown formatting for important warning block
    
    Co-authored-by: Matt Savoie <github@flamingbear.com>
    
    * Reorder changes in order of importance
    
    Co-authored-by: Matt Savoie <github@flamingbear.com>
    
    * Clearer wording on setting relationships
    
    Co-authored-by: Matt Savoie <github@flamingbear.com>
    
    * remove "technically"
    
    Co-authored-by: Matt Savoie <github@flamingbear.com>
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Matt Savoie <github@flamingbear.com>
    3 people authored Oct 15, 2024
    Configuration menu
    Copy the full SHA
    97799e8 View commit details
    Browse the repository at this point in the history
  3. Re-implement map_over_datasets

    The main changes:
    
    - It is implemented using zip_subtrees, which means it should properly
      handle DataTrees where the nodes are defined in a different order.
    - For simplicity, I removed handling of `**kwargs`, in order to preserve
      some flexibility for adding keyword arugments.
    - I removed automatic skipping of empty nodes, because there are almost
      assuredly cases where that would make sense. This could be restored
      with a option keyword arugment.
    shoyer committed Oct 15, 2024
    Configuration menu
    Copy the full SHA
    739573a View commit details
    Browse the repository at this point in the history
  4. docs(groupby): mention deprecation of squeeze kwarg (pydata#9625)

    As mentioned in pydata#2157, the docstring of `Dataset.groupby` does not
    reflect deprecation of squeeze (as the docstring of `DataArray.groupby`
    does) and states an incorrect default value.
    Sibgatulin authored Oct 15, 2024
    Configuration menu
    Copy the full SHA
    c3dabe1 View commit details
    Browse the repository at this point in the history
  5. Add inherit=False option to DataTree.copy() (pydata#9628)

    * Add inherit=False option to DataTree.copy()
    
    This PR adds a inherit=False option to DataTree.copy, so users can
    decide if they want to inherit coordinates from parents or not when
    creating a subtree.
    
    The default behavior is `inherit=True`, which is a breaking change from
    the current behavior where parent coordinates are dropped (which I
    believe should be considered a bug).
    
    * fix typing
    
    * add migration guide note
    
    * ignore typing error
    shoyer authored Oct 15, 2024
    Configuration menu
    Copy the full SHA
    56f0e48 View commit details
    Browse the repository at this point in the history
  6. Bug fixes for DataTree indexing and aggregation (pydata#9626)

    * Bug fixes for DataTree indexing and aggregation
    
    My implementation of indexing and aggregation was incorrect on child
    nodes, re-creating the child nodes from the root.
    
    There was also another bug when indexing inherited coordinates that meant
    formerly inherited coordinates were incorrectly dropped from results.
    
    * disable broken test
    shoyer authored Oct 15, 2024
    Configuration menu
    Copy the full SHA
    7486f4e View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    aafc278 View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2024

  1. Type check datatree tests (pydata#9632)

    * type hints for datatree ops tests
    
    * type hints for datatree aggregations tests
    
    * type hints for datatree indexing tests
    
    * type hint a lot more tests
    
    * more type hints
    TomNicholas authored Oct 16, 2024
    Configuration menu
    Copy the full SHA
    88a95cf View commit details
    Browse the repository at this point in the history
  2. Add zip_subtrees for paired iteration over DataTrees (pydata#9623)

    * Add zip_subtrees for paired iteration over DataTrees
    
    This should be used for implementing DataTree arithmetic inside
    map_over_datasets, so the result does not depend on the order in which
    child nodes are defined.
    
    I have also added a minimal implementation of breadth-first-search with
    an explicit queue the current recursion based solution in
    xarray.core.iterators (which has been removed). The new implementation
    is also slightly faster in my microbenchmark:
    
        In [1]: import xarray as xr
    
        In [2]: tree = xr.DataTree.from_dict({f"/x{i}": None for i in range(100)})
    
        In [3]: %timeit _ = list(tree.subtree)
        # on main
        87.2 μs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    
        # with this branch
        55.1 μs ± 294 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    
    * fix pytype error
    
    * Tweaks per review
    shoyer authored Oct 16, 2024
    Configuration menu
    Copy the full SHA
    0c1d02e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    bed8cba View commit details
    Browse the repository at this point in the history