Document that `Variable.encoding` is ignored if encoding is given in `to_netcdf` #7127

observingClouds · 2022-10-04T21:57:48Z

What happened?

With a change from xarray version 2022.06.0 to 2022.09.0 the following output is no longer written as float32 but float64.

What did you expect to happen?

I expected the output to have the same dtype.

Minimal Complete Verifiable Example

import xarray as xr
ds = xr.tutorial.load_dataset("eraint_uvz")
encoding = {'z':{'zlib':True}
ds.z.to_netcdf("compressed.nc", encoding=encoding)

MVCE confirmation

Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

# xarray version == 2022.06.0

netcdf compressed {
dimensions:
	longitude = 480 ;
	latitude = 241 ;
	level = 3 ;
	month = 2 ;
variables:
	float longitude(longitude) ;
		longitude:_FillValue = NaNf ;
		longitude:units = "degrees_east" ;
		longitude:long_name = "longitude" ;
	float latitude(latitude) ;
		latitude:_FillValue = NaNf ;
		latitude:units = "degrees_north" ;
		latitude:long_name = "latitude" ;
	int level(level) ;
		level:units = "millibars" ;
		level:long_name = "pressure_level" ;
	int month(month) ;
	float z(month, level, latitude, longitude) ;
		z:_FillValue = NaNf ;
		z:number_of_significant_digits = 5 ;
		z:units = "m**2 s**-2" ;
		z:long_name = "Geopotential" ;
		z:standard_name = "geopotential" ;


# xarray version == 2022.09.0

netcdf compressed {
dimensions:
	longitude = 480 ;
	latitude = 241 ;
	level = 3 ;
	month = 2 ;
variables:
	float longitude(longitude) ;
		longitude:_FillValue = NaNf ;
		longitude:units = "degrees_east" ;
		longitude:long_name = "longitude" ;
	float latitude(latitude) ;
		latitude:_FillValue = NaNf ;
		latitude:units = "degrees_north" ;
		latitude:long_name = "latitude" ;
	int level(level) ;
		level:units = "millibars" ;
		level:long_name = "pressure_level" ;
	int month(month) ;
	double z(month, level, latitude, longitude) ;
		z:_FillValue = NaN ;
		z:number_of_significant_digits = 5 ;
		z:units = "m**2 s**-2" ;
		z:long_name = "Geopotential" ;
		z:standard_name = "geopotential" ;

Anything else we need to know?

In addition to the change of dtype from float to double, I wonder if both outputs should actually rather be int16, because this is the dtype of the original dataset:

>>> import xarray as xr
>>> ds = xr.tutorial.load_dataset("eraint_uvz")
>>> ds.z.encoding
{'source': '.../.cache/xarray_tutorial_data/e4bb6ebf67663eeab3ff30beae6a5acf-eraint_uvz.nc', 'original_shape': (2, 3, 241, 480), 'dtype': dtype('int16'), '_FillValue': nan, 'scale_factor': -1.7250274674967954, 'add_offset': 66825.5}
>>> ds.z.to_netcdf("original.nc")

netcdf original {
dimensions:
	longitude = 480 ;
	latitude = 241 ;
	level = 3 ;
	month = 2 ;
variables:
	float longitude(longitude) ;
		longitude:_FillValue = NaNf ;
		longitude:units = "degrees_east" ;
		longitude:long_name = "longitude" ;
	float latitude(latitude) ;
		latitude:_FillValue = NaNf ;
		latitude:units = "degrees_north" ;
		latitude:long_name = "latitude" ;
	int level(level) ;
		level:units = "millibars" ;
		level:long_name = "pressure_level" ;
	int month(month) ;
	short z(month, level, latitude, longitude) ;
		z:_FillValue = 0s ;
		z:number_of_significant_digits = 5 ;
		z:units = "m**2 s**-2" ;
		z:long_name = "Geopotential" ;
		z:standard_name = "geopotential" ;
		z:add_offset = 66825.5 ;
		z:scale_factor = -1.7250274674968 ;

Sorry for mixing an issue with a question, but why is the add_offset and scale_factor applied and the values saved as float32/float64 in case encoding is set? I guess encoding in to_netcdf is overwriting the initial encoding, because

ds.z.to_netcdf("test_w_offset.nc", encoding={"z":{"add_offset":66825.5, "scale_factor":-1.7250274674968, "dtype":'int16'}})

produces the expected output that matches the original one. So I imagine, a good way of setting the output encoding is currently something like

ds.to_netcdf("compressed.nc", encoding={v:{**ds.v.encoding, "zlib":True} for v in ds.data_vars})

in case an encoding similar to the input encoding - with additional parameters (e.g. 'zlib') - is requested.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.25.1.el8_4.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1

xarray: 2022.6.0. # or 2022.9.0
pandas: 1.5.0
numpy: 1.23.3
scipy: None
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.4.1
pip: 22.2.2
conda: None
pytest: None
IPython: 8.3.0
sphinx: None

The text was updated successfully, but these errors were encountered:

this is a temporary restiction to ensure only correct encoding for the output is used. Related: pydata/xarray#7127

dcherian · 2023-01-15T16:00:02Z

I guess encoding in to_netcdf is overwriting the initial encoding, because

Yes this is right. It would be nice to point this out in the docstring.

* improved docstring of to_netcdf (issue #7127) * Spelling * Update xarray/core/dataset.py --------- Co-authored-by: Tom Nicholas <thomas.nicholas@columbia.edu> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

…lazy-array * upstream/main: (153 commits) Add HDF5 Section to read/write docs page (pydata#8012) [pre-commit.ci] pre-commit autoupdate (pydata#8014) Update interpolate_na in dataset.py (pydata#7974) improved docstring of to_netcdf (issue pydata#7127) (pydata#7947) Expose "Coordinates" as part of Xarray's public API (pydata#7368) Core team member guide (pydata#7999) join together duplicate entries in the text `repr` (pydata#7225) Update copyright year in README (pydata#8007) Allow opening datasets with nD dimenson coordinate variables. (pydata#7989) Move whats-new entry [pre-commit.ci] pre-commit autoupdate (pydata#7997) Add documentation on custom indexes (pydata#6975) Use variable name in all exceptions raised in `as_variable` (pydata#7995) Bump pypa/gh-action-pypi-publish from 1.8.7 to 1.8.8 (pydata#7994) New whatsnew section Remove future release notes before this release Update whats-new.rst for new release (pydata#7993) Remove hue_style from plot1d docstring (pydata#7925) Add new what's new section (pydata#7986) Release summary for v2023.07.0 (pydata#7979) ...

observingClouds added bug needs triage Issue that has not been reviewed by xarray team member labels Oct 4, 2022

observingClouds changed the title ~~Dtype changes if any encoding is given~~ Dtype changes if any encoding is given in to_netcdf Oct 4, 2022

This was referenced Oct 4, 2022

eraint_uvz test_full fails observingClouds/xbitinfo#139

Closed

Fix 139 observingClouds/xbitinfo#140

Merged

observingClouds added a commit to observingClouds/xbitinfo that referenced this issue Oct 5, 2022

only allow netcdf4 engine

2b28eee

this is a temporary restiction to ensure only correct encoding for the output is used. Related: pydata/xarray#7127

observingClouds added a commit to observingClouds/xbitinfo that referenced this issue Oct 6, 2022

only allow netcdf4 engine

453d259

this is a temporary restiction to ensure only correct encoding for the output is used. Related: pydata/xarray#7127

dcherian added topic-documentation and removed bug needs triage Issue that has not been reviewed by xarray team member labels Jan 15, 2023

dcherian changed the title ~~Dtype changes if any encoding is given in to_netcdf~~ Document that Variable.encoding is ignored if encoding is given in to_netcdf Jan 15, 2023

dcherian added the contrib-good-first-issue label Jan 15, 2023

vallirep added a commit to vallirep/xarray that referenced this issue Jun 27, 2023

improved docstring of to_netcdf (issue pydata#7127)

556326e

vallirep mentioned this issue Jun 27, 2023

improved docstring of to_netcdf (issue #7127) #7947

Merged

1 task

dcherian closed this as completed in #7947 Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document that `Variable.encoding` is ignored if encoding is given in `to_netcdf` #7127

Document that `Variable.encoding` is ignored if encoding is given in `to_netcdf` #7127

observingClouds commented Oct 4, 2022 •

edited

Loading

dcherian commented Jan 15, 2023

Document that Variable.encoding is ignored if encoding is given in to_netcdf #7127

Document that Variable.encoding is ignored if encoding is given in to_netcdf #7127

Comments

observingClouds commented Oct 4, 2022 • edited Loading

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

dcherian commented Jan 15, 2023

Document that `Variable.encoding` is ignored if encoding is given in `to_netcdf` #7127

Document that `Variable.encoding` is ignored if encoding is given in `to_netcdf` #7127

observingClouds commented Oct 4, 2022 •

edited

Loading