Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix datetime_to_numeric and Variable._to_numeric #2668

Merged
merged 15 commits into from
Feb 11, 2019

Conversation

fujiisoup
Copy link
Member

@fujiisoup fujiisoup commented Jan 11, 2019

Started to fixing #2667

else:
print(type(array), array.dtype)
print(array)
return array.astype(dtype)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests pass a np.ndarray with object dtype, an entry of which is datetime.datetime, such as

<class 'numpy.ndarray'> object
[datetime.timedelta(0) datetime.timedelta(days=1)
 datetime.timedelta(days=2)]

Is it an expected behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think astype() does any conversion for object arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fujiisoup this function might be helpful for casting the object arrays:

def _possibly_convert_objects(values):
"""Convert arrays of datetime.datetime and datetime.timedelta objects into
datetime64 and timedelta64, according to the pandas convention.
"""
return np.asarray(pd.Series(values.ravel())).reshape(values.shape)

Copy link
Member Author

@fujiisoup fujiisoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still tests are failing.
Probably I do not well understand how cftime is working now...

@shoyer
Copy link
Member

shoyer commented Jan 11, 2019

@spencerkclark what convention do you use for missing values in cftime arrays?

@spencerkclark
Copy link
Member

what convention do you use for missing values in cftime arrays?

Currently there is nothing built in to xarray to support missing values for cftime arrays. I think we would have to come up with a convention.

@shoyer
Copy link
Member

shoyer commented Jan 11, 2019 via email

@shoyer
Copy link
Member

shoyer commented Jan 11, 2019 via email

@pep8speaks
Copy link

pep8speaks commented Jan 13, 2019

Hello @fujiisoup! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 10, 2019 at 15:04 Hours UTC

@fujiisoup
Copy link
Member Author

I couldn't make datetime_to_numeric compatible both np.ndarray and Variable. I think it would be cleaner if Variable has a (private) method to do this, like Variable._to_numeric(offset=None, datetime_unit=None).

@shoyer
Copy link
Member

shoyer commented Jan 13, 2019

I couldn't make datetime_to_numeric compatible both np.ndarray and Variable. I think it would be cleaner if Variable has a (private) method to do this, like Variable._to_numeric(offset=None, datetime_unit=None).

Sounds good to me. It's definitely best to handle numpy arrays and xarray.Variable objects separately.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fujiisoup for your working cleaning up this logic; I have a few initial comments/questions.

@@ -40,6 +40,8 @@ Enhancements
Bug fixes
~~~~~~~~~

- Bug fix for interpolation with an datetime array. (:issue:`2668`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @dcherian caught this bug before it made it into a release, so maybe we do not need a bug-fix note?

xarray/tests/test_duck_array_ops.py Show resolved Hide resolved

@requires_cftime
@requires_scipy
def test_cftime_interp():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you can find a better name for this test? It does not seem to use cftime.

@@ -24,6 +24,7 @@
from .coordinates import (
DatasetCoordinates, LevelCoordinatesSource,
assert_coordinate_consistent, remap_label_indexers)
from .duck_array_ops import datetime_to_numeric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within differentiate, we use datetime_to_numeric on Variable objects; I think you should switch to the new Variable-specific method there.

array = array - offset

if datetime_unit:
array = array / np.timedelta64(1, datetime_unit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to move this to after we potentially convert an array of datetime.timedelta objects to np.timedelta64 (otherwise this will raise an error).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain this a little more -- for cftime dates, array - offset (a few lines above) will return an array of datetime.timedelta objects. When datetime_to_numeric was used with xarray.Variable objects, we did not have to worry about converting to np.timedelta64, because that would happen automatically. In the case of pure NumPy or dask arrays, we do.

@fujiisoup
Copy link
Member Author

Thanks, @spencerkclark

It looks I need more work for this (probably a week).
It is not working for cftime stuffs, but I do not yet understand how it is handled in xarray.

@spencerkclark
Copy link
Member

It looks I need more work for this (probably a week).
It is not working for cftime stuffs, but I do not yet understand how it is handled in xarray.

Thanks @fujiisoup, no worries. There was only one place where I saw a potential issue for cftime dates, which I noted above. I made a PR to your branch with the fix (fujiisoup#9); let me know if there were other places with issues and I can sort those out too.

Some minor changes to try and help with cftime dates
@fujiisoup
Copy link
Member Author

Thanks, @spencerkclark, for the PR.
It was super helpful.

I will finish it up tonight (hopefully).

@fujiisoup fujiisoup changed the title WIP: fix regression about datetime_to_numeric fix datetime_to_numeric and Variable._to_numeric Jan 14, 2019
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good @fujiisoup.

In #2667 you mentioned:

Then, it would be probably nice if datetime_to_numeric only accepts np.ndarray and da.array not Variable or DataArray, and move this function to duck_array_ops.

By da.array I'm assuming you mean dask arrays? Should we strive to have compatibility in datetime_to_numeric there? Right now for dask arrays with dtype datetime64[ns] it eagerly loads the data, while for cftime dask arrays there is an error. I think both issues would be reasonably straightforward to fix, but do you think it's worth the effort?

Again I'm happy to help out, particularly on the cftime side.

@@ -573,3 +573,17 @@ def test_cftime_to_non_cftime_error():

with pytest.raises(TypeError):
da.interp(time=0.5)


@requires_cftime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, please remove this decorator too.

# Conflicts:
#	xarray/core/dataset.py
#	xarray/core/missing.py
@fujiisoup
Copy link
Member Author

@spencerkclark

Sorry for leaving it for a long time.
I was a little stressed these days...

Should we strive to have compatibility in datetime_to_numeric there?

As I'm not a heavy user of datetime arrays, I personally do not mind if we do not have dask compatibility for datetime_to_numeric.
Probably it would be OK to leave it for the later development if someone wants this feature.

@spencerkclark
Copy link
Member

As I'm not a heavy user of datetime arrays, I personally do not mind if we do not have dask compatibility for datetime_to_numeric.

No worries @fujiisoup, this is totally fine with me. I think the most common use case for datetime arrays is in indexes, which currently need to be stored in memory anyway, so I don't think the dask use case would come up very often.

@fujiisoup fujiisoup merged commit 4cd56a9 into pydata:master Feb 11, 2019
@fujiisoup
Copy link
Member Author

Thanks @spencerkclark for reviewing. Merged.

@fujiisoup fujiisoup deleted the fix_datetime_to_numeic branch February 11, 2019 09:47
@spencerkclark
Copy link
Member

Thanks for wrapping this up @fujiisoup!

dcherian pushed a commit to yohai/xarray that referenced this pull request Feb 14, 2019
* master:
  typo in whats_new (pydata#2763)
  Update computation.py to use Python 3 function signatures (pydata#2756)
  add h5netcdf+dask tests (pydata#2737)
  Fix name loss when masking (pydata#2749)
  fix datetime_to_numeric and Variable._to_numeric (pydata#2668)
  Fix mypy errors (pydata#2753)
  enable internal plotting with cftime datetime (pydata#2665)
  remove references to cyordereddict (pydata#2750)
  BUG: Pass kwargs to the FileManager for pynio engine (pydata#2380) (pydata#2732)
  reintroduce pynio/rasterio/iris to py36 test env (pydata#2738)
  Fix CRS being WKT instead of PROJ.4 (pydata#2715)
  Refactor (part of) dataset.py to use explicit indexes (pydata#2696)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

datetime interpolation doesn't work
4 participants