Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy 1.18 support #3537

Merged
merged 7 commits into from
Nov 19, 2019
Merged

Numpy 1.18 support #3537

merged 7 commits into from
Nov 19, 2019

Conversation

crusaderky
Copy link
Contributor

@crusaderky crusaderky commented Nov 15, 2019

Fix mean() and nanmean() for datetime64 arrays on numpy backend when upgrading from numpy 1.17 to 1.18.
All other nan-reductions on datetime64s were broken before and remain broken.
mean() on datetime64 and dask was broken before and remains broken.

@crusaderky
Copy link
Contributor Author

xref #3434

@crusaderky
Copy link
Contributor Author

Ready for review and merge

@crusaderky crusaderky mentioned this pull request Nov 15, 2019
12 tasks
@mathause
Copy link
Collaborator

Given numpy/numpy#14841 - is this worth the extra complexity?

Currently xarray is only broken wrt numpy master and there is a chance numpy/numpy#14841 gets into 1.18 - then we could just use np.nanmin?

Also - is this the only place of min for datetimes? Should this also go into the operator dispatcher?

@dopplershift
Copy link
Contributor

dopplershift commented Nov 15, 2019

IMO, it's always best to release code that you know will work rather than rely on upstream to get something into the next release in order for you not to be broken. I say that both as a downstream user of xarray and a library maintainer.

@dcherian
Copy link
Contributor

LGTM. I think we should merge so we can get a release out.

@crusaderky
Copy link
Contributor Author

crusaderky commented Nov 15, 2019

@dopplershift fully agree.

is this the only place of min for datetimes?

This PR does not fix the nan-reductions for datetime arrays.

It very narrowly fixes the use case of mean() and nanmean() with a numpy backend, so that they work on numpy 1.18 as they did on 1.17. As a matter of fact they never worked with dask, were never tested, and still won't work after the PR.

To emphasize: I am not introducing any new feature or fixing bugs on numpy 1.17; I'm just getting rid of regressions on the upgrade from 1.17 to 1.18.

Specifically, the PR fixes this failing test:

@arm_xfail
@pytest.mark.parametrize("dask", [False, True])
def test_datetime_reduce(dask):
time = np.array(pd.date_range("15/12/1999", periods=11))
time[8:11] = np.nan
da = DataArray(np.linspace(0, 365, num=11), dims="time", coords={"time": time})
if dask and has_dask:
chunks = {"time": 5}
da = da.chunk(chunks)
actual = da["time"].mean()
assert not pd.isnull(actual)
actual = da["time"].mean(skipna=False)
assert pd.isnull(actual)
# test for a 0d array
assert da["time"][0].mean() == da["time"][:1].mean()

which was not well formulated - despite its name, it doesn't test all reductions, only mean(), and in fact never runs on dask (the index of a dask array is not chunked).
I have now rewritten it for the sake of clarity.

If numpy/numpy#14841 makes into master before 1.18 is published, and if it actually delivers on making reductions work with NaT (I did not test it yet), I'll be happy to write a second PR and rework everything.
Even then, a switch of this sort will still be needed:

def datetime_nanmin(array, **kwargs):
    if LooseVersion(numpy.__version__) < "1.18":
        return np.min(array, **kwargs)
    else:
        return np.nanmin(array, **kwargs)

The actual code will be in fact more complicated because you also need to think about dask.

@max-sixty
Copy link
Collaborator

Agree—let's merge?

@mathause
Copy link
Collaborator

Ok, thanks for the clarification.

@crusaderky
Copy link
Contributor Author

crusaderky commented Nov 15, 2019

Updated PR description:

Fix mean() and nanmean() for datetime64 arrays on numpy backend when upgrading from numpy 1.17 to 1.18.
All other nan-reductions on datetime64s were broken before and remain broken.
mean() on datetime64 and dask was broken before and remains broken.

@crusaderky
Copy link
Contributor Author

I just realised there are no tests at all for datetime_to_numeric(offset=None) on a dask backend. I suspect I broke something - I need to look at it better.

@pep8speaks
Copy link

pep8speaks commented Nov 18, 2019

Hello @crusaderky! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-11-19 09:23:14 UTC

@crusaderky
Copy link
Contributor Author

datetime_to_numeric(offset=None) never worked with dask to begin with - there's a big fat TODO in it about it. I rewrote most of the PR; now mean() works with dask.

Pre-empting an observation: yes, the nanmin function I wrote could be used to work around the shortcomings of both numpy and dask on datetime64. I don't think xarray should do it and IMHO the workaround currently implemented for mean() should never have been done to begin with.

Again, this PR only fixes the regression on numpy 1.18.

@crusaderky
Copy link
Contributor Author

Ready for review and merge

xarray/core/duck_array_ops.py Outdated Show resolved Hide resolved
@crusaderky crusaderky merged commit 45fd0e6 into pydata:master Nov 19, 2019
@crusaderky crusaderky deleted the numpy118 branch November 19, 2019 14:06
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 22, 2019
* master: (24 commits)
  Tweaks to release instructions (pydata#3555)
  Clarify conda environments for new contributors (pydata#3551)
  Revert to dev version
  0.14.1 whatsnew (pydata#3547)
  sparse option to reindex and unstack (pydata#3542)
  Silence sphinx warnings (pydata#3516)
  Numpy 1.18 support (pydata#3537)
  tweak whats-new. (pydata#3540)
  small simplification of rename from pydata#3532 (pydata#3539)
  Added fill_value for unstack (pydata#3541)
  Add DatasetGroupBy.quantile (pydata#3527)
  ensure rename does not change index type (pydata#3532)
  Leave empty slot when not using accessors
  interpolate_na: Add max_gap support. (pydata#3302)
  units & deprecation merge (pydata#3530)
  Fix set_index when an existing dimension becomes a level (pydata#3520)
  add Variable._replace (pydata#3528)
  Tests for module-level functions with units (pydata#3493)
  Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502)
  FUNDING.yml (pydata#3523)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 4, 2019
* upstream/master: (22 commits)
  Resolve the version issues on RTD (pydata#3589)
  Add bottleneck & rasterio git tip to upstream-dev CI (pydata#3585)
  update whats-new.rst (pydata#3581)
  Examples for quantile (pydata#3576)
  add cftime intersphinx entries (pydata#3577)
  Add pyXpcm to Related Projects doc page (pydata#3578)
  Reimplement quantile with apply_ufunc (pydata#3559)
  add environment file for binderized examples (pydata#3568)
  Add drop to api.rst under pending deprecations (pydata#3561)
  replace duplicate method _from_vars_and_coord_names (pydata#3565)
  propagate indexes in to_dataset, from_dataset (pydata#3519)
  Switch examples to notebooks + scipy19 docs improvements (pydata#3557)
  fix whats-new.rst (pydata#3554)
  Tweaks to release instructions (pydata#3555)
  Clarify conda environments for new contributors (pydata#3551)
  Revert to dev version
  0.14.1 whatsnew (pydata#3547)
  sparse option to reindex and unstack (pydata#3542)
  Silence sphinx warnings (pydata#3516)
  Numpy 1.18 support (pydata#3537)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test failure with upstream-dev
6 participants