Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undesired decoding to timedelta64 (was: units of "seconds" translated to time coordinate) #1621

Open
pacioos opened this issue Oct 10, 2017 · 16 comments

Comments

@pacioos
Copy link

pacioos commented Oct 10, 2017

When using open_dataset( ), it is translating data variables with units of "seconds" to time coordinates. For example, measurements of wave period. I don't believe xarray should treat variables as time coordinates unless their units are of "seconds since...". I have noticed that changing my units to "second" or "sec" or "s" prevents xarray from translating the variable to datetime64 and keeps it float64, as desired. More details and an OPeNDAP example posted on github here: https://stackoverflow.com/questions/46552078/xarray-wave-period-in-seconds-ingested-as-timedelta64

@shoyer
Copy link
Member

shoyer commented Oct 10, 2017

Thanks for filing an issue -- I usually follow the xarray tag on StackOverflow but missed this one.

I wrote an answer on StackOverflow to explain why this works this way and how to work around it, but as I say in my answer although this is intended behavior I'm open to ideas for improvement. You are not the first person to complain about this, specifically for handling of wave periods. See #843 and links therein for prior discussion (CC @ocefpaf).

@pacioos
Copy link
Author

pacioos commented Oct 10, 2017

Thanks @shoyer. I understand the issue better now. Had not considered timedeltas. That does conflate the issue. No sure-fire solution, then, so I'll continue with the workaround. Cheers.

@rsignell-usgs
Copy link

rsignell-usgs commented Oct 20, 2017

On https://stackoverflow.com/a/46675990/2005869, @shoyer explains:

My understanding of CF standard names is that forecast_period should be equal to the difference between time and forecast_reference_time, i.e., forecast_period = time - forecast_reference_time. If you specified your time_offset variable with units in the form "hours", then it would be decoded to timedelta64, along with datetime64 for time and time_run, so xarray's arithmetic would actually satisfy this identity. You might find this useful if you only wanted to include two of these variables and wanted to calculate the third on the fly. On the other hand, you probably don't want to convert the Tper variable to timedelta64. Technically, it is also a time period, but it's not a variable that makes sense to compare to time.

I understand the potential issue here, but I think Xarray should follow CF conventions for time, and only treat variables as time coordinates if they have valid CF time units (<time unit> since <date>).

We know of thousands of datasets (every dataset with waves!) where the current Xarray behavior is a problem.

@shoyer
Copy link
Member

shoyer commented Oct 20, 2017

I understand the potential issue here, but I think Xarray should follow CF conventions for time, and only treat variables as time coordinates if they have valid CF time units ( since ).

Rich, I know you've been involved in CF conventions and standard names, but I don't think the CF conventions on time apply directly here. These are "time difference" units, which are a distinct type of quantity. And assuredly the period of a wave is a type of "time difference" as well -- it's just not one that is useful to decode into an array with dtype np.timedelta64. Really this is a limitation of the non-ideal support for units in the NumPy ecosystem.

Is there some other sort of metadata we could use to make this distinction of "physical" vs. "human" time differences? Now is a good time to make changes, since we are on the verge of making a major release (v0.10).

Throwing out some ideas, none of which I particularly like (but to be clear, I don't like the status quo either):

  1. We could unilaterally stop automatic decoding into timedelta64. (We would need to add a separate helper function that could be called to do these conversions afterwards.)
  2. We could add look-up tables that recognize standard_name attributes, either a white-list or black-list for timedelta64 compatible variables. (This would be a first for xarray, and is not something I'm particularly looking forward to maintaining.)
  3. We could decode coordinates and data variables differently, only converting coordinates into timedelta64. (This is not entirely ideal either, since it's easy to switch between the two in xarray's data model.)

@rabernat
Copy link
Contributor

I vote for 1, plus a verbose warning message.

I have never found timedelta64 indices to be particularly useful.

@ocefpaf
Copy link
Contributor

ocefpaf commented Oct 21, 2017

I have never found timedelta64 indices to be particularly useful.

Same here. 👍 for 1

PS: 2 could be the start of a nice "CF-addon" package for xarray but I don't think it should be in the xarray code.

@jhamman
Copy link
Member

jhamman commented Oct 21, 2017

I don't have a strong opinion here but 1 seems best.

@rsignell-usgs
Copy link

I vote for 1 also. How many makes a quorum? 😸

@shoyer
Copy link
Member

shoyer commented Oct 24, 2017

OK, sounds like there is consensus on removing this. I would still like to there to be an option for doing this sort of decoding, because I'm sure somebody finds this useful (at least I did, back when I wrote it!).

In particular, it would be nice to have some way to round-trip the timedelta64 dtype. A simple way to do this would be to recognize the attribute dtype='timedelta64[ns] (as an xarray-specific convention) and use that for decoding/encoding timedelta64 dtypes.

My suggested path forward:

  1. Add decoding support for recognizing dtype='timedelta64[ns]' and decoding it into the NumPy dtype. We have some very similar examples already (e.g., for dtype=bool), so this should not be hard.
  2. Write all timedelta64 dtype data in netCDF files by saving the dtype attribute instead of units.
  3. Issue a FutureWarning about what's going on that is triggered whenever unit='time_unit' is detected.
  4. In the next major release of xarray, stop decoding time units.

Anyone interested in taking this on? All the logic can be found in xarray/conventions.py.

@shoyer shoyer changed the title units of "seconds" translated to time coordinate Undesired decoding to timedelta64 (was: units of "seconds" translated to time coordinate) Apr 26, 2018
@ocefpaf ocefpaf mentioned this issue May 4, 2018
4 tasks
@stale
Copy link

stale bot commented Mar 26, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Mar 26, 2020
@alexamici
Copy link
Collaborator

@shoyer I'm hitting this on xarray 0.15.1 so I guess the effort had stalled.

I'm quite happy with coordinates being decoded to timedelta64, but I'd love to have the option to not decode time interval in data variables.

Is there any plan to come back to this? And would the preferred solution still be the abovementioned one?

I'll need to spend some effort fixing our use case, possibly with a horrible work-around, so depending on the complexity of the preferred solution, I may be able to work on a PR for xarray instead.

@stale stale bot removed the stale label Apr 16, 2020
@shoyer
Copy link
Member

shoyer commented Apr 16, 2020

I still think the ideal path forward is #1621 (comment), but clearly nobody was excited about taking on this effort :).

I do still think we probably should retain a way to serialize/unserialize timedelta64 data before we switch the default behavior, rather than breaking existing users without any recourse.

That said, we certainly could add an optional flag for disabling decoding to timedelta64 (e.g., decode_timedelta=False in open_dataset/decode_cf) now, without changing anything else in xarray. The default flag switch could be saved until later, when the new timedelta64 serialization (steps 1 and 2 from above) works.

@alexamici
Copy link
Collaborator

Ok, either @aurghs or I will take a shot at implementing decode_timedelta=False in the near future.

@rabernat
Copy link
Contributor

Now is a good time to make changes, since we are on the verge of making a major release (v0.10).

That ship has clearly sailed quite a while ago! 🤣 But I think I speak for many when I say THANK YOU @alexamici for taking this up again. Many people will still be very happy if this gets implemented.

shoyer pushed a commit that referenced this issue May 19, 2020
* add decode_timedelta kwarg in decode_cf and open_* functions and test.

* Fix style issue

* Add chang author reference

* removed check decode_timedelta in open_dataset

* fix docstring indentation

* fix: force dtype in test decode_timedelta
@stale
Copy link

stale bot commented May 1, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label May 1, 2022
@shoyer
Copy link
Member

shoyer commented May 1, 2022

Still relevant!

@stale stale bot removed the stale label May 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants