-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pad and unpad functions #158
Conversation
I started drafting a If we want to add an optional padding inside the I haven't added any test in case the coordinates are dates or timestamps. I'm not used to work with such datatype and found that there are a lot of ways of defining date ranges ( @bennyrowland @roxyboy @rabernat I would like to hear your thoughts on this. |
xrft/padding.py
Outdated
return padded_da | ||
|
||
|
||
def pad_coordinates(coords, pad_width): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@santisoler this looks really cool. Can we upstream this to xarray?
It would depending on finishing this PR first: https://github.com/pydata/xarray/pull/4974/files but that one is really close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Like I said to @shoyer in #146, I'll be happy to implement the padding of evenly spaced coordinates into xarray. I would just need to have little discussions about the design of the feature and which should be the default behavior of the pad
method: Should it pad the coordinates if they are evenly spaced by default? Should the coordinates be padded only if an optional argument is passed?
Feel free to ping me after pydata/xarray#4974 is merged and we can start working on it.
# Start padding the coordinates that appear in pad_width | ||
for dim in pad_width: | ||
# Get the spacing of the selected coordinate | ||
# (raises an error if not evenly spaced) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if the correct behaviour is to error for unevenly spaced coordinates, maybe it is better to fall back on the existing behaviour of pad (particularly for applications in xarray outside of FTs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that for a general case it should fall back to the default behaviour of the xarray pad
method. Nevertheless, for xrft
I think we should raise the error. If the coordinates are not evenly spaced, you won't be able to pass the padded grid to fft
(or get awkward errors related to the presence of NaN
s in the coordinates.
In summary:
- I would leave the error raising inside
xrft
- I wouldn't add the error on an
xarray
implementation of this feature.
@santisoler great work, I really like what you have done so far. I agree with @dcherian that it should be pretty straightforward to move this into xarray, although there are bound to be some advanced cases I haven't considered. Personally I am happy with just padding float coordinates, although I am sure some people will want to use date/timestamps. Maybe we need to start by putting in some more checks on datatypes for the coordinates, and falling back on existing behaviour for datatypes we don't currently want to handle? |
Thanks @bennyrowland! For my own applications, I'm totally ok with just supporting padding of floats, nevertheless I think it's important to also support datetimes. If we merge this only supporting floats, it's very likely we will get Issues asking for extending the support for datetimes. Like I said before, I'm not familarized with the different ways of defining datetimes, so I cannot come up with a good solution on how to support each one of them. That's why I would love to get some help on it. |
Hi everyone, First : thanks for the effort ! I have some comments/questions (probably more related to my applications):
Thanks ! |
Hi @lanougue,
I do think that we should support other types of variables for the coordinates (e.g. dates), but this is probably not going to happen unless someone with a specific use case (and testing cases) wants to tackle it. And regarding moving this feature upstream to xarray: pydata/xarray#4974 hasn't been merged yet, so we probably should merge this PR here and wait until it can be a source of inspiration for the upstream version.
Sure, I don't see a problem with that. In fact this is something that should be than: coords attributes should be in agreement with the actual coordinates arrays present in the |
Hi @santisoler , I agree with you with this xarray upstream issue. Personally, I do not know application of padding other than Fourier transforms or convolution problem at the boundaries... I thus imagine it can take a while to the xarray community to care about padding. But who knows ?! Meanwhile, I am in favor of merging this PR too ! As far as I know, xrft plays with only two (coordinate's) attributes which are "spacing" and "direct_lag". These two arguments should in fact stay unchanged through the padding procedure. (After verifying that coordinate is indeed evenly spaced). |
@lanougue I've just updated this branch with |
Changes looks nice. I did not played with your function for now but one interesting option, in my point of view, would be to keep or not the original name of the coordinates. I think it could be safer to give a new name for the padded coordinates. The default behavior would be (x -> padded_x) unless some keyword argument like "keep_coords_name" is set to True. Other thing, I would rather writes: This two comments are only suggestions and may be discussed (and maybe passed to upstream discussions concerning the first suggestion). |
You're absolutely right! That's not only better in terms on sanity, but also is way more readable.
I think I disagree here. One of the advantages of |
Ok, it makes sense. It would be nice if other contributors could give their opinion about that. |
I get it and I agree that would be great to have a way to recover the unpadded array. # Pad an array
path_width = {
"x": (10, 10),
"y": (5, 5),
} # define amount of padding
grid_padded = xrft.pad(grid, pad_width)
# Perform some actions on the padded array
# ...
# Unpad it
grid = xrft.unpad(grid_padded, pad_width) It would take the same I'm more prone to tell users to keep track of the Besides, any user could store the # Pad an array
path_width = {
"x": (10, 10),
"y": (5, 5),
} # define amount of padding
grid_padded = xrft.pad(grid, pad_width)
grid_padded.attrs["pad_width"] = pad_width
# Perform some actions on the padded array
# ...
# Unpad it
grid = xrft.unpad(grid_padded, grid_padded.pad_width) What do you think? |
It exactly remembers me the discussion I had with @roxyboy and @rabernat when I was working on the xrft.ifft function. It is the same logic for both functions. We can let the user manually store the required information to recover the initial state or we can automatically do it. |
@santisoler @lanougue Thanks for your work and input. I agree that the |
You're right! Thanks for helping me to remember what was the issue behind it. Since we want to support a very straight forward round trip like One alternative is to keep the In conclusion, I think we actually need to store the padding on each padded coordinate inside the DataArray. # Pad an array
path_width = {
"x": (10, 10),
"y": (5, 5),
} # define amount of padding
grid_padded = xrft.pad(grid, pad_width)
# Perform some actions on the padded array
# ...
# Unpad it
grid = xrft.unpad(grid_padded) A part of me wants to support custom grid = xrft.unpad(grid_padded, pad_width={"x": ..., "y": ...}) But if no one will use it, there's no point in adding that feature. I'll try to implement the |
numpy logic would be to specify a "shape" parameter in xrft.fft but I think that xrft logic should depart from this behaviour. I prefer your approach where you let the user do the padding/unpadding procedure. It, as you said, keeps fft and ifft much simpler (and do not limit their possibilities). I would also like to get the complete round trip: I think that your feature (unpadding with a custom pad_width) can easily be resolved by setting the default pad_width (=None?) being the normal one. Meaning the same used in forward padding (that would be thus stored somewhere) Waiting for your ping ! |
Extend the examples of pad() and test the new feature.
The function reads the pad_width attribute from the array coordinates in order to slice it to its unpadded version. Add test functions for the new feature.
I would gladly participate on discussions around this design decision. My idea of making
I've just pushed the implementation of
Sure! We can set |
I think that adding the pad_width attribute for the entire array is a bad idea for 2 reasons:
Ok for the round trip issue. We have to transfer the pad_width attribute through the fft/ifft operations. |
I agree: bad idea to store the So, let me add support for passing a custom |
The pad_width can be optionally passed to unpad as a single pad_width argument or as kwargs for each coordinate. By default pad_width is None, therefore the pad_width is read from the attributes of the array coordinates. Add test functions for the new argument.
By making it a private function we avoid listing it in the API reference.
@santisoler, thanks for the great work on this. I think that if I wanted to do a round trip I have also previously suggested that alternative forms of the parameter currently called |
Hi @bennyrowland. I think the current behavior proposed here do not have any problem with what you want to do. You need to provide pad_width whatever pad or unpad function you are calling first. In the same spirit, pad function could also look at some already existent pad_width (or unpad_width) that would come from a previous unpad call. |
I understand. I personally agree with your workflow: for me it's easier to ask for
I remembered you mentioning that you usually want to pad your arrays by specifying the final size instead of the width. I could argue that you can do that with the current new_size_x, new_size_y = 256, 256
pad_width = {"x": (new_size_x - grid.x.size) / 2, "y": (new_size_y - grid.y.size) / 2}
xrft.pad(grid, pad_width) But I know it can be tedious to do every time. I decided to go with Option 1 My first suggestion would be creating different functions for you case, which can wrap the def pad_to(
da,
size,
mode="constant",
stat_length=None,
constant_values=None,
end_values=None,
reflect_type=None,
**size_kwargs,
):
# Redefine size if size_kwargs were passed
size = either_dict_or_kwargs(size, size_kwargs, "pad")
# Build pad_width from the size
pad_width = {dim: int(0.5 * (pad_size[dim] - da[dim].size)) for dim in size}
# Call pad
return xrft.pad(da, pad_width, ...) (we can work on the names). So you could pad by passing final sizes, like this: size = {"x": 256, "y": 256}
grid_padded = xrft.pad_to(grid, size) Option 2 Another option would be to have a utils function to convert from def size_to_pad_width(da, size):
pad_width = {dim: int(0.5 * (pad_size[dim] - da[dim].size)) for dim in size}
return pad_width So you could pad using the sizes, like this: size = {"x": 256, "y": 256}
pad_width = size_to_pad_width(grid, size)
grid_padded = xrft.pad(grid, pad_width) The benefit of this is we will have a single function for padding and a single function for unpadding. The drawback is that we would need to write one more line of code when using it. That being said, I would leave these ideas for a new PR. |
Am I to understand that now the |
@santisoler, yes but it is (of course) more complicated than that. In my case, I do have asymmetric data which I want to pad to be symmetric, as well as to a specific length, so it definitely isn't a simple enough calculation to do manually every time. I acknowledge that using |
Done @roxyboy! I've found that the line you wrote above works well for the array, but not for the coordinates (they get shifted around the origin), so we need to pass So the roundtrip looks like: da_padded = pad(da, pad_width, constant_values=0)
da_fft = fft(da_padded, true_phase=True)
da_ifft = ifft(da_fft, true_phase=True)
da_unpadded = unpad(da_ifft, pad_width=pad_width) On a side note, I remembered that the current |
Awesome! Thanks @santisoler . We're almost at the finish line. I agree with |
Done! The docstrings were specifying that
You mean that Thank you for all the feedback! This PR wouldn't have been possible without it! |
Yes. I'll merge this and make a new release. |
I didn't get why true_phase =True/False interferes with the padding ? Whatever true_phase value is should not change the coordinates... |
When setting Let me know if I explained myself, otherwise I can provide an example. |
Ok I understand. thanks! When true_phase == False, the coordinate range is indeed lost during the fft/ifft operations and the unpadding can not indeed do the job correctly. You are right, the round trip can be done efficiently only when true_phase is set to True during both fft/ifft operations. |
Add a new
pad
function that wraps thexarray.DataArray.pad
methodextrapolating evenly spaced coordinates. The
pad_width
argument is storedwithin the attributes of each coordinate of the padded array. Add a new
unpad
function that undoes the padding process by slicing it. By default it reads the
pad_width
from the coordinate attributes of the input array. Alternatively,we can pass a custom
pad_width
as argument to the function. List the paddingmodule in the API reference. Add test functions for the new features with 100%
coverage of the new padding module.
Fixes #146
TODO (on a different PR):
fft
andifft
keep thepad_width
attribute of the coordinates.