Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pad method #2605

Closed
shoyer opened this issue Dec 13, 2018 · 9 comments · Fixed by #3596
Closed

Pad method #2605

shoyer opened this issue Dec 13, 2018 · 9 comments · Fixed by #3596

Comments

@shoyer
Copy link
Member

shoyer commented Dec 13, 2018

It would be nice to have a generic .pad() method to xarray objects based on numpy.pad and dask.array.pad.

In particular,pad with mode='wrap' could solve several use-cases related to periodic boundary conditions: #1005 , #2007. For example, ds.pad(longitude=(0, 1), mode='wrap') to add an extra point with periodic boundary conditions along the longitude dimension.

It probably makes sense to linearly extrapolate coordinates along padded dimensions, as long as they are regularly spaced. This might use heuristics and/or a keyword argument.

I don't have a plans to work on this in the near term. It could be a good project of moderate complexity for a new contributor.

@mark-boer
Copy link
Contributor

Has some1 started working on this? I thought I might give it a try ;-)

@jhamman
Copy link
Member

jhamman commented Nov 2, 2019

@mark-boer - I don't think so. Give it a go!

@dcherian
Copy link
Contributor

dcherian commented Nov 8, 2019

@mark-boer Note that we have a Variable.pad_with_fill_value method and some other padding code in rolling and coarsen that could be consolidated.

@mark-boer
Copy link
Contributor

@dcherian Thx! I found the Variable.pad_with_fill_value already, but I will also have a look at rolling and coarsen.

I have some questions related to this issue, I hope this is the correct place to ask those:

Dask added the pad method in version 1.7.0 according to its documentation. What is the minimum version of Dask that should be supported, I found Dask=1.2 is the continuous integration requirements.

Also, how should I handle implementation differences between numpy and Dask? E.g. in the current version of Dask and numpy I'm using: mode="mean" converts an array of integers to floats in Dask, but in numpy it keeps it an array of integers.

@dcherian
Copy link
Contributor

You can copy over dask.array.pad to dask_array_compat.py and do something like

if LooseVersion(dask_version) >= LooseVersion("1.7.0"):
    pad = dask.array.pad
else:
    # copied from dask.array
    def pad(...):
        ...

mode="mean" converts an array of integers to floats in Dask, but in numpy it keeps it an array of integers.

That's weird. Open an issue at the dask repo with an example?

Feel free to open an in-progress PR. Thanks for working on this.

@dcherian
Copy link
Contributor

dcherian commented Dec 3, 2019

Also see #3587

@mark-boer
Copy link
Contributor

I will create a WIP pull request tommorow.

@lanougue
Copy link

lanougue commented Dec 6, 2019

Hi,
I was looking to some xarray padding function and get this issue.
For the moment, I made a function of my own based on numpy.pad and xr.apply_ufunc
When possible, it also pad associated coordinates. If it can be of any help here...
Here it is:

def xpad(ds, dims={}):
    """
    Padding of xarray. Coordinate are linearly padded if original coordinates are evenly spaced. Otherwise, no new coordinates are affected to padded axis.
    Padded dimension is named with prefix 'padded_'
    
    Args:
        ds (xarray): xarray
        dims (dict): keys are dimensions along which to pad and values are padding tuple (see np.pad). (ex {'pulse:(10,0)})
        
    Returns:
        (xarray) : same as input with padded axis. 
    """
    mypad = [(0,0) for n in ds.dims if n not in dims.keys()]
    mypad+=list(dims.values())
    padded_ds = xr.apply_ufunc(np.pad, ds, mypad,input_core_dims=[list(dims.keys()),[]], output_core_dims=[['padded_'+d for d in dims.keys()]], keep_attrs=True)
    
    for var, ext in dims.items():
        dvar = np.diff(ds[var])
        if np.allclose(dvar, dvar[0]):
            dvar = dvar[0]
            left_bound, right_bound  = (np.min(ds[var]).data, np.max(ds[var]).data) if dvar>0. else (np.max(ds[var]).data, np.min(ds[var]).data)
            extended_var = np.append(ds[var].data, np.max(ds[var]).data+np.arange(1,ext[1]+1)*dvar)
            extended_var = np.append(np.min(ds[var]).data+np.arange(-ext[0],0)*dvar, extended_var)
            padded_ds = padded_ds.assign_coords(**{'padded_'+var:extended_var})
        else:
            print('Coordinates {} are not evenly spaced, padding is impossible'.format(var))
    return padded_ds

@lanougue
Copy link

lanougue commented Dec 6, 2019

Ho, sorry... I just see the PR...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants