Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow chunk_store argument when opening Zarr datasets #3804

Merged
merged 11 commits into from
Aug 25, 2020
Merged

Allow chunk_store argument when opening Zarr datasets #3804

merged 11 commits into from
Aug 25, 2020

Conversation

ajelenak
Copy link
Contributor

This is a simple modification to pass chunk_store argument to the zarr package when opening Zarr datasets.

I don't think this requires any documentation changes and new tests.

@rsignell-usgs
Copy link

This PR is motivated by the work described in this Medium blog post

@rabernat
Copy link
Contributor

rabernat commented Feb 27, 2020

Fantastic, thanks @ajelenak so much for this contribution!

Two items come to mind:

  • this will definitely need a test, just something simple to verify that the chunk_store argument works as expected
  • curious what will happen if you pass an actual Zarr storage object. I assume it will work fine, since these are all mutable mappings.

@ajelenak
Copy link
Contributor Author

@rabernat Added a test that uses an actual Zarr store.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the update.

The new test is unfortunately failing with the error

E       TypeError: open() got multiple values for keyword argument 'chunk_store'

Could you look into this?

As noted in comment below, this test also makes me wonder whether we want a chunk_store argument in to_zarr as well. It might make sense to do them both as part of this PR.

xarray/tests/test_backends.py Show resolved Hide resolved
@rabernat
Copy link
Contributor

Sorry for letting this hang so long. Now that zarr-developers/zarr-python#557 has been merged and zarr 2.4.0 has been released, we should be able to finish this off.

The easiest thing would be to just require zarr > 2.4.0. However, we currently don't require zarr at all...it's an optional dependency. The code in this PR will fail for older versions of zarr. What's the best way to handle this? Having a bunch of internal checks for zarr versions feels clunky, but we've done it before.

xarray/backends/zarr.py Outdated Show resolved Hide resolved
@dcherian
Copy link
Contributor

It'd be nice to add some documentation to https://xarray.pydata.org/en/stable/io.html but that can happen in a future PR.

@rsignell-usgs
Copy link

rsignell-usgs commented Jul 28, 2020

@dcherian , are we just waiting for one more "+1" here, or are the failing checks related to this PR?

* upstream/master: (36 commits)
  enable fail_on_warning for RTD (#4269)
  update the list item numbers in the release guide (#4264)
  remove the compatibility code in duck_array_ops.allclose_or_equiv (#4270)
  Un-xfail cftime plotting tests (#4272)
  xfail failing upstream plotting tests (#4271)
  Improve some error messages: apply_ufunc & set_options. (#4259)
  Support cupy in as_shared_dtype (#4232)
  Fix DataArray.copy documentation: remove confusing mention of 'dataset' (Gh3606) (#4245)
  Removed skipna argument from count, any, all [GH755] (#4258)
  Added xarrays-spatial and updated geoviews link (#4262)
  update docs to point to xarray-contrib and xarray-tutorial (#4252)
  Add release summary, some touch-ups (#4217)
  CFTimeIndex calendar in repr (#4092)
  fix the RTD timeouts (#4254)
  update isort CI and pre-commit hook (#4204)
  Add initial cupy tests (#4214)
  Add 0.16.0 release summary
  New whatsnew section
  Release v0.16.0
  Minor reorg of whatsnew for 0.16.0 (#4216)
  ...
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM as long as tests pass.

@rabernat?

@jhamman jhamman closed this Aug 5, 2020
@jhamman jhamman reopened this Aug 5, 2020
Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cycled the CI to (hopefully) fix the failing test. In the meantime, two other comments popped out at me.

xarray/tests/test_sparse.py Outdated Show resolved Hide resolved
xarray/backends/zarr.py Show resolved Hide resolved
@dcherian dcherian added topic-backends topic-zarr Related to zarr storage library labels Aug 14, 2020
…hunkstore

 Fixed conflicts in:
• xarray/core/dataset.py
• xarray/tests/test_sparse.py
@ajelenak
Copy link
Contributor Author

Is there anything else to be done in this PR, or is it ready for merge? All tests pass.

@dcherian
Copy link
Contributor

Looks good to me. I will merge in a couple of days if no one else has any comments.

@rsignell-usgs
Copy link

rsignell-usgs commented Aug 25, 2020

Drumroll.... @dcherian, epic cymbal crash?

@dcherian
Copy link
Contributor

Thanks for being patient @ajelenak I am excited to see this go in. welcome to xarray!

@dcherian dcherian merged commit 9c85dd5 into pydata:master Aug 25, 2020
@ajelenak
Copy link
Contributor Author

Thanks @dcherian . And thanks to all who provided suggestions and comments in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-backends topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants