Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that Variable.encoding is ignored if encoding is given in to_netcdf #7127

Closed
4 tasks done
observingClouds opened this issue Oct 4, 2022 · 1 comment · Fixed by #7947
Closed
4 tasks done

Comments

@observingClouds
Copy link
Contributor

observingClouds commented Oct 4, 2022

What happened?

With a change from xarray version 2022.06.0 to 2022.09.0 the following output is no longer written as float32 but float64.

What did you expect to happen?

I expected the output to have the same dtype.

Minimal Complete Verifiable Example

import xarray as xr
ds = xr.tutorial.load_dataset("eraint_uvz")
encoding = {'z':{'zlib':True}
ds.z.to_netcdf("compressed.nc", encoding=encoding)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

# xarray version == 2022.06.0

netcdf compressed {
dimensions:
	longitude = 480 ;
	latitude = 241 ;
	level = 3 ;
	month = 2 ;
variables:
	float longitude(longitude) ;
		longitude:_FillValue = NaNf ;
		longitude:units = "degrees_east" ;
		longitude:long_name = "longitude" ;
	float latitude(latitude) ;
		latitude:_FillValue = NaNf ;
		latitude:units = "degrees_north" ;
		latitude:long_name = "latitude" ;
	int level(level) ;
		level:units = "millibars" ;
		level:long_name = "pressure_level" ;
	int month(month) ;
	float z(month, level, latitude, longitude) ;
		z:_FillValue = NaNf ;
		z:number_of_significant_digits = 5 ;
		z:units = "m**2 s**-2" ;
		z:long_name = "Geopotential" ;
		z:standard_name = "geopotential" ;


# xarray version == 2022.09.0

netcdf compressed {
dimensions:
	longitude = 480 ;
	latitude = 241 ;
	level = 3 ;
	month = 2 ;
variables:
	float longitude(longitude) ;
		longitude:_FillValue = NaNf ;
		longitude:units = "degrees_east" ;
		longitude:long_name = "longitude" ;
	float latitude(latitude) ;
		latitude:_FillValue = NaNf ;
		latitude:units = "degrees_north" ;
		latitude:long_name = "latitude" ;
	int level(level) ;
		level:units = "millibars" ;
		level:long_name = "pressure_level" ;
	int month(month) ;
	double z(month, level, latitude, longitude) ;
		z:_FillValue = NaN ;
		z:number_of_significant_digits = 5 ;
		z:units = "m**2 s**-2" ;
		z:long_name = "Geopotential" ;
		z:standard_name = "geopotential" ;

Anything else we need to know?

In addition to the change of dtype from float to double, I wonder if both outputs should actually rather be int16, because this is the dtype of the original dataset:

>>> import xarray as xr
>>> ds = xr.tutorial.load_dataset("eraint_uvz")
>>> ds.z.encoding
{'source': '.../.cache/xarray_tutorial_data/e4bb6ebf67663eeab3ff30beae6a5acf-eraint_uvz.nc', 'original_shape': (2, 3, 241, 480), 'dtype': dtype('int16'), '_FillValue': nan, 'scale_factor': -1.7250274674967954, 'add_offset': 66825.5}
>>> ds.z.to_netcdf("original.nc")
netcdf original {
dimensions:
	longitude = 480 ;
	latitude = 241 ;
	level = 3 ;
	month = 2 ;
variables:
	float longitude(longitude) ;
		longitude:_FillValue = NaNf ;
		longitude:units = "degrees_east" ;
		longitude:long_name = "longitude" ;
	float latitude(latitude) ;
		latitude:_FillValue = NaNf ;
		latitude:units = "degrees_north" ;
		latitude:long_name = "latitude" ;
	int level(level) ;
		level:units = "millibars" ;
		level:long_name = "pressure_level" ;
	int month(month) ;
	short z(month, level, latitude, longitude) ;
		z:_FillValue = 0s ;
		z:number_of_significant_digits = 5 ;
		z:units = "m**2 s**-2" ;
		z:long_name = "Geopotential" ;
		z:standard_name = "geopotential" ;
		z:add_offset = 66825.5 ;
		z:scale_factor = -1.7250274674968 ;

Sorry for mixing an issue with a question, but why is the add_offset and scale_factor applied and the values saved as float32/float64 in case encoding is set? I guess encoding in to_netcdf is overwriting the initial encoding, because

ds.z.to_netcdf("test_w_offset.nc", encoding={"z":{"add_offset":66825.5, "scale_factor":-1.7250274674968, "dtype":'int16'}})

produces the expected output that matches the original one. So I imagine, a good way of setting the output encoding is currently something like

ds.to_netcdf("compressed.nc", encoding={v:{**ds.v.encoding, "zlib":True} for v in ds.data_vars})

in case an encoding similar to the input encoding - with additional parameters (e.g. 'zlib') - is requested.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.25.1.el8_4.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1

xarray: 2022.6.0. # or 2022.9.0
pandas: 1.5.0
numpy: 1.23.3
scipy: None
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.4.1
pip: 22.2.2
conda: None
pytest: None
IPython: 8.3.0
sphinx: None

@observingClouds observingClouds added bug needs triage Issue that has not been reviewed by xarray team member labels Oct 4, 2022
@observingClouds observingClouds changed the title Dtype changes if any encoding is given Dtype changes if any encoding is given in to_netcdf Oct 4, 2022
observingClouds added a commit to observingClouds/xbitinfo that referenced this issue Oct 5, 2022
this is a temporary restiction to ensure only correct
encoding for the output is used.

Related: pydata/xarray#7127
observingClouds added a commit to observingClouds/xbitinfo that referenced this issue Oct 6, 2022
this is a temporary restiction to ensure only correct
encoding for the output is used.

Related: pydata/xarray#7127
@dcherian
Copy link
Contributor

I guess encoding in to_netcdf is overwriting the initial encoding, because

Yes this is right. It would be nice to point this out in the docstring.

@dcherian dcherian added topic-documentation and removed bug needs triage Issue that has not been reviewed by xarray team member labels Jan 15, 2023
@dcherian dcherian changed the title Dtype changes if any encoding is given in to_netcdf Document that Variable.encoding is ignored if encoding is given in to_netcdf Jan 15, 2023
vallirep added a commit to vallirep/xarray that referenced this issue Jun 27, 2023
dcherian added a commit that referenced this issue Jul 21, 2023
* improved docstring of to_netcdf (issue #7127)

* Spelling

* Update xarray/core/dataset.py

---------

Co-authored-by: Tom Nicholas <thomas.nicholas@columbia.edu>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
dcherian added a commit to dcherian/xarray that referenced this issue Jul 24, 2023
…lazy-array

* upstream/main: (153 commits)
  Add HDF5 Section to read/write docs page (pydata#8012)
  [pre-commit.ci] pre-commit autoupdate (pydata#8014)
  Update interpolate_na in dataset.py (pydata#7974)
  improved docstring of to_netcdf (issue pydata#7127) (pydata#7947)
  Expose "Coordinates" as part of Xarray's public API (pydata#7368)
  Core team member guide (pydata#7999)
  join together duplicate entries in the text `repr` (pydata#7225)
  Update copyright year in README (pydata#8007)
  Allow opening datasets with nD dimenson coordinate variables. (pydata#7989)
  Move whats-new entry
  [pre-commit.ci] pre-commit autoupdate (pydata#7997)
  Add documentation on custom indexes (pydata#6975)
  Use variable name in all exceptions raised in `as_variable` (pydata#7995)
  Bump pypa/gh-action-pypi-publish from 1.8.7 to 1.8.8 (pydata#7994)
  New whatsnew section
  Remove future release notes before this release
  Update whats-new.rst for new release (pydata#7993)
  Remove hue_style from plot1d docstring (pydata#7925)
  Add new what's new section (pydata#7986)
  Release summary for v2023.07.0 (pydata#7979)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants