Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving sparse Dataset using ds.to_netcdf() #3415

Closed
Timothysit opened this issue Oct 18, 2019 · 1 comment
Closed

Saving sparse Dataset using ds.to_netcdf() #3415

Timothysit opened this issue Oct 18, 2019 · 1 comment

Comments

@Timothysit
Copy link

MCVE Code Sample

# Your code here

np.random.seed(123)

times = pd.date_range("2000-01-01", "2001-12-31", name="time")
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))

base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)


ds = xr.Dataset(
    {
        "tmin": (("time", "location"), tmin_values),
        "tmax": (("time", "location"), tmax_values),
    },
    {"time": times, "location": ["IA", "IN", "IL"]},
)

df = ds.to_dataframe()

sparse_ds = xr.Dataset.from_dataframe(df, sparse=True)

sparse_ds.to_netcdf('sparse_ds.nc')

Expected Output

No output (file saved)

Problem Description

I am trying to save a sparse dataset using ds.to_netcdf(), but instead gets a type error (see below). I also tried using hdf5 and zarr but got similar errors. The only way I found so far is to use pickle. Am I doing something wrong or is there currently no way to save sparse datasets?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-c6538bca6d4f> in <module>
     21 sparse_ds = xr.Dataset.from_dataframe(df, sparse=True)
     22 
---> 23 sparse_ds.to_netcdf('sparse_ds.nc')

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1538             unlimited_dims=unlimited_dims,
   1539             compute=compute,
-> 1540             invalid_netcdf=invalid_netcdf,
   1541         )
   1542 

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1072         # to be parallelized with dask
   1073         dump_to_store(
-> 1074             dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1075         )
   1076         if autoclose:

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1118         variables, attrs = encoder(variables, attrs)
   1119 
-> 1120     store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
   1121 
   1122 

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    301         self.set_dimensions(variables, unlimited_dims=unlimited_dims)
    302         self.set_variables(
--> 303             variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
    304         )
    305 

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
    342             )
    343 
--> 344             writer.add(source, target)
    345 
    346     def set_dimensions(self, variables, unlimited_dims=None):

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/common.py in add(self, source, target, region)
    187                 target[region] = source
    188             else:
--> 189                 target[...] = source
    190 
    191     def sync(self, compute=True):

~/miniconda3/envs/msi/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in __setitem__(self, key, value)
     51         with self.datastore.lock:
     52             data = self.get_array(needs_lock=False)
---> 53             data[key] = value
     54             if self.datastore.autoclose:
     55                 self.datastore.close(needs_lock=False)

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__setitem__()

TypeError: __array__() takes 1 positional argument but 2 were given

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-31-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1

xarray: 0.13.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.6.0
distributed: 2.6.0
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: None
pytest: None
IPython: 7.8.0
sphinx: None

@shoyer
Copy link
Member

shoyer commented Oct 20, 2019

Am I doing something wrong or is there currently no way to save sparse datasets?

Unfortunately this is correct -- there is currently no way to save sparse datasets.

We would love to add support for this, but it will need a volunteer to do the implementation. At the very least, we should raise a more informative error message.

See #3213 for discussion about how this could work.

@shoyer shoyer closed this as completed Oct 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants