Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options for zarr array definition #48

Merged
merged 3 commits into from
Sep 21, 2020

Conversation

eric-czech
Copy link
Contributor

@eric-czech eric-czech commented Sep 13, 2020

Fixes #46

This adds options to rechunk that are passed to zarr create methods. The only use cases I have in mind for this are allowing for overwrites and compression, but it may be useful in other ways. Otherwise it may eventually be worth lifting those parameters into the rechunk signature given that allowing any option to be passed is a bit of a footgun.

Note: I had to modify one of the existing test fixtures since it was doing almost what I wanted for a compression test except that the chunks were too small for compression to take effect. I didn't know that occurred with zarr but I can't find any other explanation for why specifying a compressor makes no difference when an array is too small. I have no idea where the cutoff is but perhaps it is related to compression block sizes? Let me know if anybody understands that better.

@codecov
Copy link

codecov bot commented Sep 13, 2020

Codecov Report

Merging #48 into master will increase coverage by 0.10%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #48      +/-   ##
==========================================
+ Coverage   94.89%   95.00%   +0.10%     
==========================================
  Files          10       10              
  Lines         392      400       +8     
  Branches       75       78       +3     
==========================================
+ Hits          372      380       +8     
  Misses         10       10              
  Partials       10       10              
Impacted Files Coverage Δ
rechunker/algorithm.py 82.45% <ø> (ø)
rechunker/api.py 92.53% <100.00%> (+0.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e454e85...ac03623. Read the comment docs.

Copy link
Collaborator

@tomwhite tomwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

rechunker/api.py Outdated
@@ -281,14 +311,29 @@ def _setup_rechunk(
raise ValueError("Source must be a Zarr Array or Group, or a Dask Array.")


def _validation_options(options):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: _validate_options might read better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-czech
Copy link
Contributor Author

Thanks @tomwhite. Let me know if there's anything else I can do to get this merged @TomAugspurger / @rabernat.

@TomAugspurger TomAugspurger merged commit 8917e20 into pangeo-data:master Sep 21, 2020
@TomAugspurger
Copy link
Member

Thanks.

@rabernat
Copy link
Member

rabernat commented Sep 21, 2020

Thanks for all your contributions @eric-czech. It's great to have you involved in rechunker!

@eric-czech
Copy link
Contributor Author

Sure thing @rabernat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set compression for target store
5 participants