`eraint_uvz` `test_full` fails #139

aaronspring · 2022-10-02T13:01:58Z

Description

https://github.com/observingClouds/xbitinfo/actions/runs/3168552683/jobs/5159830575

assert compressed_size < ori_size
assert 9640058 < 4186398

Did defaults in to_netcdf change with new xarray release?

The text was updated successfully, but these errors were encountered:

observingClouds · 2022-10-02T19:07:01Z

I checked the sizes for the different xarray releases:

2022.06.0:

ori_size = 4186398
compressed_size = 4131352
bitrounded_compressed_size = 496448

2022.09.0

ori_size = 4186398
compressed_size = 9640058
bitrounded_compressed_size = 447607

observingClouds · 2022-10-02T20:07:30Z

Okay, I think I know what is going on here. There are two issues:

The dtype of the output has changed from float32 to float64 from xarray 2022.06.0 to 2022.09.0 and as a consequence the compression is not as efficient.
By using the encoding argument within xr.Dataset.to_netcdf the original encoding is overwritten. As a consequence, add_offset and scale_factor are applied to the variables and the output is written as float instead of int16, which the original dataset is saved in. Zlib with float32 is efficient enough to compress better than int16 with scale_factor and add_offset, but can't keep up when the data is in float64.

For reference:

the original encoding:

{
    'source': '.../xarray_tutorial_data/e4bb6ebf67663eeab3ff30beae6a5acf-eraint_uvz.nc',
    'original_shape': (2, 3, 241, 480),
    'dtype': dtype('int16'),
    '_FillValue': nan,
    'scale_factor': -1.7250274674967954,
    'add_offset': 66825.5
}

the encoding after the call of get_compress_encoding_nc within to_compressed_netcdf:

{
    'zlib': True,
    'shuffle': True,
    'complevel': 9,
    'chunksizes': (2, 3, 241, 480)
}

observingClouds · 2022-10-04T23:20:06Z

Related upstream issue: pydata/xarray#7127

observingClouds · 2022-10-05T00:12:56Z

And also related to pydata/xarray#7129

- merge encodings of dataset and compression as output encoding fixes #139

aaronspring added the testing label Oct 2, 2022

observingClouds mentioned this issue Oct 2, 2022

Fix 139 #140

Merged

observingClouds closed this as completed in #140 Oct 6, 2022

observingClouds added a commit that referenced this issue Oct 6, 2022

Merge pull request #140 from observingClouds/fix_139

c897de3

- merge encodings of dataset and compression as output encoding fixes #139

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`eraint_uvz` `test_full` fails #139

`eraint_uvz` `test_full` fails #139

aaronspring commented Oct 2, 2022

observingClouds commented Oct 2, 2022

observingClouds commented Oct 2, 2022 •

edited

Loading

observingClouds commented Oct 4, 2022

observingClouds commented Oct 5, 2022

eraint_uvz test_full fails #139

eraint_uvz test_full fails #139

Comments

aaronspring commented Oct 2, 2022

Description

observingClouds commented Oct 2, 2022

observingClouds commented Oct 2, 2022 • edited Loading

observingClouds commented Oct 4, 2022

observingClouds commented Oct 5, 2022

`eraint_uvz` `test_full` fails #139

`eraint_uvz` `test_full` fails #139

observingClouds commented Oct 2, 2022 •

edited

Loading