Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDF encoded data not automatically decoded back into original dtype #4973

Closed
chrism0dwk opened this issue Feb 28, 2021 · 2 comments
Closed

Comments

@chrism0dwk
Copy link

chrism0dwk commented Feb 28, 2021

What happened: When reading in an encoded netCDF4 file, encoded variables are not transformed back to their original dtype in the resulting xarray.

What you expected to happen:
As with the raw netCDF4 package, if an xarray.DataArray of dtype float64 is encoded into a netCDF4 file as a float32, it should be converted back to the original float64 when the netCDF4 dataset is read back in.

Minimal Complete Verifiable Example:

import xarray as xr
import numpy as np
foo = xr.DataArray(np.random.uniform(size=[100,100]).astype(np.float64))
foo.dtype  # float64
ds = xr.Dataset({'foo': foo})
ds.to_netcdf("foo.nc", encoding={'foo': {'dtype': 'float32', 'scale_factor': 1.0, 'add_offset': 0.0}})
ds1 = xr.open_dataset("foo.nc")
ds1['foo'].dtype  # float32, not float64 as expected

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.7 (default, Mar 23 2020, 22:36:06)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-66-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.17.0
pandas: 1.1.5
numpy: 1.19.5
scipy: None
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.02.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0
pip: 20.2.2
conda: None
pytest: None
IPython: 7.21.0
sphinx: None

@chrism0dwk chrism0dwk changed the title NetCDF encoded data not read back into original dtype NetCDF encoded data not automatically decoded back into original dtype Feb 28, 2021
@mathause
Copy link
Collaborator

mathause commented Mar 2, 2021

How would a netcdf look that converts back to float64? Is that saved in an attribute?

@chrism0dwk
Copy link
Author

@mathause Maybe I'm misunderstanding the concept of encoding in xarray. In the "Writing encoded data" section of the xarray doc, it states

The encoding argument takes a dictionary with variable names as keys and variable specific encodings as values. These encodings are saved as attributes on the netCDF variables on disk, which allows xarray to faithfully read encoded data back into memory.

I took this to imply that a dataset of type float64 could be "compressed" to float32 within the NetCDF4 file, and be transparently read back to a float64 data structure on reading. Having looked more closely at the NetCDF4 spec, I can't see any way to save this information into the file without adding an arbitrary non-spec attribute. Would be a cool feature, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants