Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move encoding from xarray.Variable to duck arrays? #5082

Open
shoyer opened this issue Mar 27, 2021 · 2 comments
Open

Move encoding from xarray.Variable to duck arrays? #5082

shoyer opened this issue Mar 27, 2021 · 2 comments

Comments

@shoyer
Copy link
Member

shoyer commented Mar 27, 2021

The encoding property on Variable has always been an awkward part of Xarray's API, and an example of poor separation of concerns. It add conceptual overhead to all uses of xarray.Variable, but exists only for the (somewhat niche) benefit of Xarray's backend IO functionality. This is particularly problematic if we consider the possible separation of xarray.Variable into a separate package to remove the pandas dependency (#3981).

I think a cleaner way to handle encoding would be to move it from Variable onto array objects, specifically duck array objects that Xarray creates when loading data from disk. As long as these duck arrays don't "propagate" themselves under array operations but rather turn into raw numpy arrays (or whatever is wrapped), this would automatically resolve all issues around propagating encoding attributes (e.g., #5065, #1614). And users who don't care about encoding because they don't use Xarray's IO functionality would never need to think about it.

@shoyer shoyer mentioned this issue Mar 31, 2021
4 tasks
@keewis
Copy link
Collaborator

keewis commented Jun 4, 2021

I think dropping on the first operation is the right thing to do, otherwise reloading might cause surprising issues. Consider this:

In [4]: encoding = {
   ...:     "add_offset": 267.39366454179356,
   ...:     "scale_factor": 0.0006500423894110363,
   ...:     "dtype": np.dtype("int16"),
   ...:     "_FillValue": -32767,
   ...: }
   ...: ds = xr.Dataset({"arr": ("x", [270, 280, 290], {}, encoding)})
   ...: ds
Out[4]: 
<xarray.Dataset>
Dimensions:  (x: 3)
Dimensions without coordinates: x
Data variables:
    arr      (x) int64 270 280 290

In [5]: ds.arr[:] = [3, 4, 5]
   ...: ds.to_netcdf("abc.nc")
   ...: with xr.open_dataset("abc.nc").load() as loaded:
   ...:     display(loaded)
   ...:     display(loaded.arr)
   ...: 
<xarray.Dataset>
Dimensions:  (x: 3)
Dimensions without coordinates: x
Data variables:
    arr      (x) float32 258.6 259.6 260.6
<xarray.DataArray 'arr' (x: 3)>
array([258.60706, 259.6068 , 260.60724], dtype=float32)
Dimensions without coordinates: x

@raybellwaves
Copy link
Contributor

I tend to do ds["var"].encoding = {} before saving. See also #5407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants