Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dataset interpolation with different datatypes #5008

Merged

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Mar 7, 2021

Allow different datatypes (in particular booleans) to be interpolated instead of dropping them.

@pep8speaks
Copy link

pep8speaks commented Mar 7, 2021

Hello @Illviljan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-13 08:27:50 UTC

@Illviljan Illviljan marked this pull request as draft March 7, 2021 16:41
@dcherian
Copy link
Contributor

dcherian commented Mar 9, 2021

This seems like a good change but shouldn't we do this only when method="nearest"?

@Illviljan
Copy link
Contributor Author

Illviljan commented Mar 10, 2021

How do you linearly interpolate strings, objects or booleans?

The problem for me is that we drop all the non-valid variables when I want linear interpolation for most of my stuff. One can argue it could be either of bfill, ffill or nearest but I chose nearest because that's an option for both missing.interp and reindex_variables. I suppose we can add it as an option? But what should the option be called when we want a backup interpolation method that does not modify the elementwise values? And should it then only be reindex_variables specific?

Another case I've been pondering how to handle is when numbers are used as IDs. Then it doesn't make sense to do interpolation that messes with the elements as well. But these integers are not easily determined though because you can use integers just for performance reasons and then wouldn't mind it becoming floats later. Maybe they will continue staying a special case.

@dcherian
Copy link
Contributor

How do you linearly interpolate strings, objects or booleans?

This is why we drop them now. my point is that doing "nearest" when the user has specified method="linear" is v. confusing. Perhaps we should have method="linear+nearest" (or something better) for what you propose: linear for numeric, nearest for everything else.

Q: Does the current version of xarray just ignore strings, objects & booleans when method="nearest"?

xarray is not sensitive to parameter order  anymore, but move the parameter to the end anyway to stay backwards compatible.
@Illviljan
Copy link
Contributor Author

Well, as a user I was very confused why my boolean variables silently disappeared.
And I don't get why ds.interp has any mandate to drop variables, if it can't handle interpolating a variable with the method an error should've been raised.
But I'm a lazy user too so I of course would prefer to not have to deal with errors or figure out which variables were dropped and do multiple interpolation steps and finally merge the different datasets back together again. This is why I prefer ds.interp to do the only current method that makes sense for non-numerics rather than dropping them.

ds.interp currently only interpolates if var.dtype.kind in "uifc", so it ignores bools, strings, and objects for any method. Which made sense I think because the scipy interpolants always attempts to force to float. This is why I use reindex_variables as it both retains the dtype and handles strings/objects just fine.

I've added a parameter now method_for_non_numerics for these cases now, so now you can choose which reindex method you want.

@Illviljan Illviljan marked this pull request as ready for review March 13, 2021 16:28
@max-sixty
Copy link
Collaborator

Let's discuss this at the next dev meeting.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Illviljan Apologies for the delay.

I have some minor comments but this looks very nice!

xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
Illviljan and others added 6 commits May 13, 2021 09:49
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
@Illviljan
Copy link
Contributor Author

@dcherian, renamed the parameter and fixed some errors due to the #5102 merge.

Docs failure seems unrelated.

@dcherian
Copy link
Contributor

Thanks @Illviljan ! Great PR. Thank you for your patience

@dcherian dcherian merged commit 1fa7b9b into pydata:master May 13, 2021
@dcherian
Copy link
Contributor

Oops just noticed this is missing a whats-new note. Can you send in a PR with that note please?

dcherian added a commit to TomNicholas/xarray that referenced this pull request May 13, 2021
* upstream/master:
  combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041)
  Explained what a deprecation cycle is (pydata#5289)
  Code cleanup (pydata#5234)
  FacetGrid docstrings (pydata#5293)
  Add whats new for dataset interpolation with non-numerics (pydata#5297)
  Allow dataset interpolation with different datatypes (pydata#5008)
  Flexible indexes: add Index base class and xindexes properties (pydata#5102)
  pre-commit: autoupdate hook versions (pydata#5280)
  convert the examples for apply_ufunc to doctest (pydata#5279)
  fix the new whatsnew section
  Ensure `HighLevelGraph` layers are `Layer` instances (pydata#5271)
dcherian added a commit to matzegoebel/xarray that referenced this pull request May 13, 2021
* upstream/master: (23 commits)
  combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041)
  Explained what a deprecation cycle is (pydata#5289)
  Code cleanup (pydata#5234)
  FacetGrid docstrings (pydata#5293)
  Add whats new for dataset interpolation with non-numerics (pydata#5297)
  Allow dataset interpolation with different datatypes (pydata#5008)
  Flexible indexes: add Index base class and xindexes properties (pydata#5102)
  pre-commit: autoupdate hook versions (pydata#5280)
  convert the examples for apply_ufunc to doctest (pydata#5279)
  fix the new whatsnew section
  Ensure `HighLevelGraph` layers are `Layer` instances (pydata#5271)
  New whatsnew section
  Release-workflow: Bug fix (pydata#5273)
  more maintenance on whats-new.rst (pydata#5272)
  v0.18.0 release highlights (pydata#5266)
  Fix exception when display_expand_data=False for file-backed array. (pydata#5235)
  Warn ignored keep attrs (pydata#5265)
  Disable workflows on forks (pydata#5267)
  fix the built wheel test (pydata#5270)
  pypi upload workflow maintenance (pydata#5269)
  ...
dcherian added a commit to gcaria/xarray that referenced this pull request May 13, 2021
…e_units

* upstream/master:
  combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041)
  Explained what a deprecation cycle is (pydata#5289)
  Code cleanup (pydata#5234)
  FacetGrid docstrings (pydata#5293)
  Add whats new for dataset interpolation with non-numerics (pydata#5297)
  Allow dataset interpolation with different datatypes (pydata#5008)
  Flexible indexes: add Index base class and xindexes properties (pydata#5102)
dcherian added a commit to ahuang11/xarray that referenced this pull request May 13, 2021
* upstream/master:
  Update release guide (pydata#5274)
  combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041)
  Explained what a deprecation cycle is (pydata#5289)
  Code cleanup (pydata#5234)
  FacetGrid docstrings (pydata#5293)
  Add whats new for dataset interpolation with non-numerics (pydata#5297)
  Allow dataset interpolation with different datatypes (pydata#5008)
@Illviljan Illviljan deleted the Illviljan-dataset_interp_several_dtypes branch May 18, 2021 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset.interp drops boolean variables
5 participants