Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing map_overlap #3147

Open
jakirkham opened this issue Jul 18, 2019 · 13 comments
Open

Implementing map_overlap #3147

jakirkham opened this issue Jul 18, 2019 · 13 comments

Comments

@jakirkham
Copy link

jakirkham commented Jul 18, 2019

Just as there are map_blocks and map_overlap methods for Dask Array, it would be useful to have equivalent methods for Xarray objects. This would make it easier to leverage duck typing to work with both Dask Arrays and Xarray objects.

Edit: Should add this came up a few times at the recent SciPy sprints.

@dcherian
Copy link
Contributor

+1. The split_by_chunks method in this comment (#1093 (comment)) would also be useful for more general per-chunk manipulation.

@jakirkham
Copy link
Author

That sounds somewhat similar to .blocks accessor in Dask Array. ( dask/dask#3689 ) Maybe we should align on that as well?

@jakirkham
Copy link
Author

jakirkham commented Jul 19, 2019

Another approach for the split_by_chunks implementation would be...

def split_by_chunks(a):
    for sl in da.core.slices_from_chunks(a.chunks): 
        yield (sl, a[sl])

While a little bit more cumbersome to write, this could be implemented with .blocks and may be a bit more performant.

def split_by_chunks(a):
    for i, sl in zip(np.ndindex(a.numblocks), da.core.slices_from_chunks(a.chunks)):
        yield (sl, a.blocks[i])

If the slices are not strictly needed, this could be simplified a bit more.

def split_by_chunks(a):
    for i in np.ndindex(a.numblocks):
        yield a.blocks[i]

Admittedly slices_from_chunks is an internal utility function. Though it is unlikely to change. We could consider exposing it as part of the API if that is useful.

We could consider other things like making .blocks iterable, which could make this more friendly as well. Raised issue ( dask/dask#5117 ) on this point.

@jhamman
Copy link
Member

jhamman commented Oct 11, 2019

map_blocks went in as of #3276. We'll leave this open for the future work implementing map_overlap.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 3, 2020

I'm thinking through a map_overlap API right now. In dask, map_overlap requires a few extra arguments

    depth: int, tuple, dict or list
        The number of elements that each block should share with its neighbors
        If a tuple or dict then this can be different per axis.
        If a list then each element of that list must be an int, tuple or dict
        defining depth for the corresponding array in `args`.
        Asymmetric depths may be specified using a dict value of (-/+) tuples.
        Note that asymmetric depths are currently only supported when
        ``boundary`` is 'none'.
        The default value is 0.
    boundary: str, tuple, dict or list
        How to handle the boundaries.
        Values include 'reflect', 'periodic', 'nearest', 'none',
        or any constant value like 0 or np.nan.
        If a list then each element must be a str, tuple or dict defining the
        boundary for the corresponding array in `args`.
        The default value is 'reflect'.

In dask.array those must be dicts whose keys are the axis number. For xarray we would want to allow the dimension names there.

I'm not sure how to handle the DataArray labels for the boundary chunks (dask docs at https://docs.dask.org/en/latest/array-overlap.html#boundaries). For reflect / periodic I think things are OK, we perhaps just use the label associated with that value. I'm not sure what to do for constants.

@dcherian
Copy link
Contributor

dcherian commented Aug 3, 2020

This issue about coordinate labels for boundaries exists with pad too: #3868

Can map_overlap just use DataArray.pad and we can fix things there?

Or perhaps we can expect users to add a call to pad before map_overlap?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 3, 2020 via email

@jakirkham
Copy link
Author

Yeah +1 for using pad instead. Had tried to get rid of map_overlap's padding and use da.pad in Dask as well ( dask/dask#5052 ), but haven't had time to get back to that.

@stale
Copy link

stale bot commented Apr 17, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Apr 17, 2022
@jakirkham
Copy link
Author

Would be good to keep this open

@stale stale bot removed the stale label Apr 17, 2022
@j2bbayle
Copy link

Indeed, this would be very useful in a great number of cases!

@jdoblas
Copy link

jdoblas commented Apr 8, 2023

very much in need of this one to able satellite image filtering across blocks

@dcherian dcherian changed the title Implementing map_blocks and map_overlap Implementing map_overlap Jun 30, 2024
@odhondt
Copy link

odhondt commented Sep 24, 2024

+1 that would be super useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants