-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zarr consolidated #2559
Zarr consolidated #2559
Conversation
Hello @rabernat! Thanks for updating the PR.
Comment last updated on December 04, 2018 at 19:34 Hours UTC |
Ping @lilyminium for a review. |
xarray/backends/api.py
Outdated
if consolidate: | ||
import zarr | ||
zarr.consolidate_metadata(store) | ||
# do we need to reload ztore now that we have consolidated? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense for zarr to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant reloading the zarr store automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be hard to achieve. And I'm not sure it's necessary. Frankly I don't know why we return a store object from to_zarr
at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zarr.consolidate_metadata
returns the output of open_consolidated on the same store, so this is already happening
Also need to add some version checks...this will only work with zarr > 2.2. |
xarray/backends/zarr.py
Outdated
|
||
def __init__(self, zarr_group): | ||
if consolidated or consolidate_on_close: | ||
if LooseVersion(zarr.__version__) <= '2.2': # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder to update this version check too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being more explicit about the version seems to fix this issue here. In the tests I have used the importorskip approach.
Not sure I understand why there are tests failing now. The failing function is https://travis-ci.org/pydata/xarray/jobs/460873430#L7489 At first glance, this does not appear to have anything to do with my PR. The relevant error is:
|
I bet this is due to the latest dask release (1.0). We can fix this in another PR. |
I remember dealing with this in my pull request -- if I recall correctly scheduler was pointing to the scheduler.get function instead. It was a minor bug that was either fixed in the next release of xarray (0.11.0) or Dask (0.20.1). |
So if the test issues can be considered resolved, the only decision we need to make is about the API. Do we prefer (the current way): ds.to_zarr(fname, consolidate=True)
xr.open_zarr(fname, consolidated=True) or @shoyer's suggestion ds.to_zarr(fname, consolidated=True)
xr.open_zarr(fname, consolidated=True) ??? |
Will the default for both options be |
Yes |
Glad to see this happening, by the way. Once in, catalogs using intake-xarray can be updated and I don't thin the code will need to change. |
Great to see this. On the API, FWIW I'd vote for using the same keyword ( |
Keywords are now all Ready to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is basically ready. I had a few small questions/comments but this looks safe for a merge here soon.
@@ -36,6 +36,8 @@ Breaking changes | |||
Enhancements | |||
~~~~~~~~~~~~ | |||
|
|||
- Ability to read and write consolidated metadata in zarr stores. | |||
By `Ryan Abernathey <https://github.com/rabernat>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reference the issue this is attached to: (:issue:`2558`).
|
||
open_kwargs = dict(mode=mode, synchronizer=synchronizer, path=group) | ||
if consolidated: | ||
# TODO: an option to pass the metadata_key keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to consider this TODO here?
|
||
open_kwargs = dict(mode=mode, synchronizer=synchronizer, path=group) | ||
if consolidated: | ||
# TODO: an option to pass the metadata_key keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything to do here now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we feel that it's important to expose this functionality from within xarray? I don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't.
I think it's ok for xarray to have an opinion on what the special key is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we just leave these TODO's here as is. If anyone ever needs this feature from the xarray side, this will help guide them on how to implement it.
LGTM Do you think there should be more explicit text of how to add consolidation to existing zarr/xarray data-sets, rather than creating them with consolidation turned on? We may also need some text around updating consolidated data-sets, but that can maybe wait to see what kind of usage people try. |
Since xarray cannot append or modify in-place existing zarr stores, this seems outside the scope of xarray for now. But maybe it is worth mentioning in the docs. |
I'm happy here. ...but Appveyor is not. |
@rabernat if you're ready, let's merge this. The failures on Appveyor are unrelated (an issue with int32 and cftime) |
👍
…Sent from my iPhone
On Dec 4, 2018, at 6:37 PM, Stephan Hoyer ***@***.***> wrote:
@rabernat if you're ready, let's merge this.
The failures on Appveyor are related (an issue with int32 and cftime)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
If anyone wants to see how awesome consolidated metadata is, you can try it in this binder: I did a bit of lazy profiling here: Things that used to take ~40s now take ~1s. Especially since loading the data is one of the first steps in any pangeo notebook, this is a huge improvement in usability. Thanks to everyone who helped make it happen! |
I like those timings. |
* upstream/master: Feature: N-dimensional auto_combine (pydata#2553) Support HighLevelGraphs (pydata#2603) Bump cftime version in doc environment (pydata#2604) use keep_attrs in binary operations II (pydata#2590) Temporarily mark dask-dev build as an allowed failure (pydata#2602) Fix wrong error message in interp() (pydata#2598) Add dayofyear and dayofweek accessors (pydata#2599) Fix h5netcdf saving scalars with filters or chunks (pydata#2591) Minor update to PR template (pydata#2596) Zarr consolidated (pydata#2559) fix examples (pydata#2581) Fix typo (pydata#2578) Concat docstring typo (pydata#2577) DOC: remove example using Dataset.T (pydata#2572) python setup.py test now works by default (pydata#2573) Return slices when possible from CFTimeIndex.get_loc() (pydata#2569) DOC: fix computation.rst (pydata#2567)
This PR adds support for reading and writing of consolidated metadata in zarr stores.
whats-new.rst
for all changes andapi.rst
for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)