Saving sparse Dataset using `ds.to_netcdf()` #3415

Timothysit · 2019-10-18T14:26:34Z

MCVE Code Sample

# Your code here

np.random.seed(123)

times = pd.date_range("2000-01-01", "2001-12-31", name="time")
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))

base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)


ds = xr.Dataset(
    {
        "tmin": (("time", "location"), tmin_values),
        "tmax": (("time", "location"), tmax_values),
    },
    {"time": times, "location": ["IA", "IN", "IL"]},
)

df = ds.to_dataframe()

sparse_ds = xr.Dataset.from_dataframe(df, sparse=True)

sparse_ds.to_netcdf('sparse_ds.nc')

Expected Output

No output (file saved)

Problem Description

I am trying to save a sparse dataset using ds.to_netcdf(), but instead gets a type error (see below). I also tried using hdf5 and zarr but got similar errors. The only way I found so far is to use pickle. Am I doing something wrong or is there currently no way to save sparse datasets?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-c6538bca6d4f> in <module>
     21 sparse_ds = xr.Dataset.from_dataframe(df, sparse=True)
     22 
---> 23 sparse_ds.to_netcdf('sparse_ds.nc')

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1538             unlimited_dims=unlimited_dims,
   1539             compute=compute,
-> 1540             invalid_netcdf=invalid_netcdf,
   1541         )
   1542 

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1072         # to be parallelized with dask
   1073         dump_to_store(
-> 1074             dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1075         )
   1076         if autoclose:

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1118         variables, attrs = encoder(variables, attrs)
   1119 
-> 1120     store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
   1121 
   1122 

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    301         self.set_dimensions(variables, unlimited_dims=unlimited_dims)
    302         self.set_variables(
--> 303             variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
    304         )
    305 

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
    342             )
    343 
--> 344             writer.add(source, target)
    345 
    346     def set_dimensions(self, variables, unlimited_dims=None):

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/common.py in add(self, source, target, region)
    187                 target[region] = source
    188             else:
--> 189                 target[...] = source
    190 
    191     def sync(self, compute=True):

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in __setitem__(self, key, value)
     51         with self.datastore.lock:
     52             data = self.get_array(needs_lock=False)
---> 53             data[key] = value
     54             if self.datastore.autoclose:
     55                 self.datastore.close(needs_lock=False)

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__setitem__()

TypeError: __array__() takes 1 positional argument but 2 were given

Output of `xr.show_versions()`

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-31-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1

xarray: 0.13.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.6.0
distributed: 2.6.0
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: None
pytest: None
IPython: 7.8.0
sphinx: None

The text was updated successfully, but these errors were encountered:

shoyer · 2019-10-20T20:39:57Z

Am I doing something wrong or is there currently no way to save sparse datasets?

Unfortunately this is correct -- there is currently no way to save sparse datasets.

We would love to add support for this, but it will need a volunteer to do the implementation. At the very least, we should raise a more informative error message.

See #3213 for discussion about how this could work.

shoyer closed this as completed Oct 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving sparse Dataset using `ds.to_netcdf()` #3415

Saving sparse Dataset using `ds.to_netcdf()` #3415

Timothysit commented Oct 18, 2019

shoyer commented Oct 20, 2019

Saving sparse Dataset using ds.to_netcdf() #3415

Saving sparse Dataset using ds.to_netcdf() #3415

Comments

Timothysit commented Oct 18, 2019

MCVE Code Sample

Expected Output

Problem Description

Output of xr.show_versions()

shoyer commented Oct 20, 2019

Saving sparse Dataset using `ds.to_netcdf()` #3415

Saving sparse Dataset using `ds.to_netcdf()` #3415

Output of `xr.show_versions()`