Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eraint_uvz test_full fails #139

Closed
aaronspring opened this issue Oct 2, 2022 · 4 comments · Fixed by #140
Closed

eraint_uvz test_full fails #139

aaronspring opened this issue Oct 2, 2022 · 4 comments · Fixed by #140
Labels

Comments

@aaronspring
Copy link
Collaborator

Description

https://github.com/observingClouds/xbitinfo/actions/runs/3168552683/jobs/5159830575

assert compressed_size < ori_size
assert 9640058 < 4186398

Did defaults in to_netcdf change with new xarray release?

@observingClouds
Copy link
Owner

I checked the sizes for the different xarray releases:

2022.06.0:

ori_size = 4186398
compressed_size = 4131352
bitrounded_compressed_size = 496448

2022.09.0

ori_size = 4186398
compressed_size = 9640058
bitrounded_compressed_size = 447607

@observingClouds
Copy link
Owner

observingClouds commented Oct 2, 2022

Okay, I think I know what is going on here. There are two issues:

  • The dtype of the output has changed from float32 to float64 from xarray 2022.06.0 to 2022.09.0 and as a consequence the compression is not as efficient.
  • By using the encoding argument within xr.Dataset.to_netcdf the original encoding is overwritten. As a consequence, add_offset and scale_factor are applied to the variables and the output is written as float instead of int16, which the original dataset is saved in. Zlib with float32 is efficient enough to compress better than int16 with scale_factor and add_offset, but can't keep up when the data is in float64.

For reference:

  • the original encoding:
{
    'source': '.../xarray_tutorial_data/e4bb6ebf67663eeab3ff30beae6a5acf-eraint_uvz.nc',
    'original_shape': (2, 3, 241, 480),
    'dtype': dtype('int16'),
    '_FillValue': nan,
    'scale_factor': -1.7250274674967954,
    'add_offset': 66825.5
}
  • the encoding after the call of get_compress_encoding_nc within to_compressed_netcdf:
{
    'zlib': True,
    'shuffle': True,
    'complevel': 9,
    'chunksizes': (2, 3, 241, 480)
}

@observingClouds observingClouds mentioned this issue Oct 2, 2022
@observingClouds
Copy link
Owner

Related upstream issue: pydata/xarray#7127

@observingClouds
Copy link
Owner

And also related to pydata/xarray#7129

observingClouds added a commit that referenced this issue Oct 6, 2022
- merge encodings of dataset and compression as output encoding

fixes #139
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants