to_dict without data #2659

rabernat · 2019-01-07T14:09:25Z

This PR provides the ability to export Datasets and DataArrays to dictionary without the actual data. This could be useful for generating indices of dataset contents to expose to search indices or other automated data discovery tools

In the process of doing this, I refactored the core dictionary export function to live in the Variable class, since the same code was duplicated in several places.

Closes dataset info in .json format #2656
Tests added
Fully documented, including whats-new.rst for all changes and api.rst for new API

pep8speaks · 2019-01-07T14:09:40Z

Hello @rabernat! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 08, 2019 at 08:43 Hours UTC

shoyer

Looks good, just a couple of nits

shoyer · 2019-01-07T16:37:24Z

xarray/core/dataarray.py

-
-        d.update({'data': ensure_us_time_resolution(self.values).tolist(),
-                  'name': self.name})
+            d['coords'].update({k: self.coords[k].variable.to_dict(data=data)})


This is only adding one item, so just using indexing for assignment seems cleaner: d['coords'][k] = self.coords[k].variable.to_dict(data=data)

shoyer · 2019-01-07T16:38:12Z

xarray/tests/test_dataarray.py

@@ -2909,6 +2909,13 @@ def test_to_and_from_dict(self):
                ValueError, "cannot convert dict without the key 'data'"):
            DataArray.from_dict(d)

+        # check the data=False option
+        expected_no_data = {**expected}


I guess this is a fancy Py3 way of doing a copy? :)

I would lean towards the more explicit expected.copy() given that you aren't inserting any extra fields here.

rabernat · 2019-01-07T17:49:03Z

It just occurred to me that it would be nice to have some extra info about the data, such as dtype.

fmaussion · 2019-01-07T19:14:41Z

xarray/tests/test_dataset.py

@@ -3045,11 +3045,20 @@ def test_to_and_from_dict(self):
        # check roundtrip
        assert_identical(ds, Dataset.from_dict(actual))

+        # check the data=False option
+        expected_no_data = expected.copy()
+        print(expected_no_data)


There is a forgotten print here!

fmaussion · 2019-01-07T19:16:05Z

It just occurred to me that it would be nice to have some extra info about the data, such as dtype.

II was about to ask about it ;-). What other attribute than dtype were you thinking about?

shoyer · 2019-01-07T20:06:55Z

What other attribute than dtype were you thinking about?

Shape might also be a good idea.

rabernat · 2019-01-08T08:48:03Z

Here is what the .to_dict(data=False) output looks like from the test example

x = np.random.randn(10)
y = np.random.randn(10)
t = list('abcdefghij')
ds = Dataset(OrderedDict([('a', ('t', x)),
                          ('b', ('t', y)), ('t', ('t', t))]))
ds.to_dict(data=False)

{'attrs': {},
 'coords': {'t': {'attrs': {},
   'dims': ('t',),
   'dtype': '<U1',
   'shape': (10,)}},
 'data_vars': {'a': {'attrs': {},
   'dims': ('t',),
   'dtype': 'float64',
   'shape': (10,)},
  'b': {'attrs': {}, 'dims': ('t',), 'dtype': 'float64', 'shape': (10,)}},
 'dims': {'t': 10}}

The one thing I don't like about this is the empty attributes. Maybe we could change it so that attrs is only included if it is non-empty. Alternatively, we coiuld add a squeeze_attrs option.

dcherian · 2019-01-08T22:30:44Z

Does this help with #2347?

rabernat · 2019-01-11T08:44:25Z

Maybe we could change it so that attrs is only included if it is non-empty.

How do people feel about this? If folks are fine with it as is, then LGTM.

fmaussion · 2019-01-11T10:29:22Z

Maybe we could change it so that attrs is only included if it is non-empty.

How do people feel about this? If folks are fine with it as is, then LGTM.

I'd rather have an empty dict as in the current implementation (like xarray)

rabernat · 2019-01-18T13:55:56Z

Let's get this merged?

jhamman · 2019-01-18T21:20:53Z

xarray/core/dataarray.py

+        ----------
+        data : bool, optional
+            Whether to include the actual data in the dictionary. When set to
+            False, returns just the schema.


It could be useful to allow including the data for the coordinates but not the data variables?

I'm thinking something like

ds.to_dict(data='coords')

shoyer · 2019-01-21T20:04:49Z

I think we should merge this -- @rabernat feel free to go ahead and do that.

We can leave data='coords' for a follow-up, if anyone has real use-cases for it.

* master: stale requires a label (pydata#2701) Update indexing.rst (pydata#2700) add line break to message posted (pydata#2698) Config for closing stale issues (pydata#2684) to_dict without data (pydata#2659) Update asv.conf.json (pydata#2693) try no rasterio in py36 env (pydata#2691) Detailed report for testing.assert_equal and testing.assert_identical (pydata#1507) Hotfix for pydata#2662 (pydata#2678) Update README.rst (pydata#2682) Fix test failures with numpy=1.16 (pydata#2675)

* refactor-plot-utils: (22 commits) review comment. small rename stale requires a label (pydata#2701) Update indexing.rst (pydata#2700) add line break to message posted (pydata#2698) Config for closing stale issues (pydata#2684) to_dict without data (pydata#2659) Update asv.conf.json (pydata#2693) try no rasterio in py36 env (pydata#2691) Detailed report for testing.assert_equal and testing.assert_identical (pydata#1507) Hotfix for pydata#2662 (pydata#2678) Update README.rst (pydata#2682) Fix test failures with numpy=1.16 (pydata#2675) lint Back to map_dataarray_line Refactor out cmap_params, cbar_kwargs processing Refactor out colorbar making to plot.utils._add_colorbar flake8 facetgrid refactor Refactor out utility functions. ...

rabernat · 2019-02-06T20:40:15Z

Too late, I found this:

https://binary-array-ld.github.io/netcdf-ld/

rabernat · 2019-02-12T19:33:10Z

And this
http://cf-json.org/specification

rabernat · 2019-02-12T19:53:58Z

p.s. I learned about this from @kwilcox in pangeo-data/pangeo-datastore#3. He might be a good person to loop into this discussion.

kwilcox · 2019-02-12T21:21:13Z

If you are interested I could implement an xarray -> cf-json -> xarray round-trip. It will be a bit different from to_dict in terms of carrying through variable and attribute types so the file can be re-produced from the JSON. They both have their use-cases!

rabernat added 2 commits January 7, 2019 14:58

add data=False to to_dict methods

cc13203

doc and whats-new

1e65cc2

fix pep8 errors

7616ae3

shoyer approved these changes Jan 7, 2019

View reviewed changes

small tweaks

4551e24

shoyer approved these changes Jan 7, 2019

View reviewed changes

fmaussion reviewed Jan 7, 2019

View reviewed changes

added shape and dtype

4cf7bc8

rabernat mentioned this pull request Jan 11, 2019

WIP: html repr #1820

Closed

8 tasks

jhamman reviewed Jan 18, 2019

View reviewed changes

rabernat merged commit a7d55b9 into pydata:master Jan 21, 2019

rabernat mentioned this pull request Feb 8, 2019

hi from xarray binary-array-ld/bald#88

Open

rabernat mentioned this pull request Feb 12, 2019

STAC and other Prior Art pangeo-data/pangeo-datastore#3

Closed

andersy005 mentioned this pull request Jan 9, 2022

Serialization of just coordinates #2347

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_dict without data #2659

to_dict without data #2659

rabernat commented Jan 7, 2019

pep8speaks commented Jan 7, 2019 •

edited

Loading

shoyer left a comment

shoyer Jan 7, 2019

shoyer Jan 7, 2019

rabernat commented Jan 7, 2019

fmaussion Jan 7, 2019

fmaussion commented Jan 7, 2019

shoyer commented Jan 7, 2019

rabernat commented Jan 8, 2019

dcherian commented Jan 8, 2019

rabernat commented Jan 11, 2019

fmaussion commented Jan 11, 2019 •

edited

Loading

rabernat commented Jan 18, 2019

jhamman Jan 18, 2019

shoyer commented Jan 21, 2019

rabernat commented Feb 6, 2019

rabernat commented Feb 12, 2019

rabernat commented Feb 12, 2019

kwilcox commented Feb 12, 2019

to_dict without data #2659

to_dict without data #2659

Conversation

rabernat commented Jan 7, 2019

pep8speaks commented Jan 7, 2019 • edited Loading

Comment last updated on January 08, 2019 at 08:43 Hours UTC

shoyer left a comment

Choose a reason for hiding this comment

shoyer Jan 7, 2019

Choose a reason for hiding this comment

shoyer Jan 7, 2019

Choose a reason for hiding this comment

rabernat commented Jan 7, 2019

fmaussion Jan 7, 2019

Choose a reason for hiding this comment

fmaussion commented Jan 7, 2019

shoyer commented Jan 7, 2019

rabernat commented Jan 8, 2019

dcherian commented Jan 8, 2019

rabernat commented Jan 11, 2019

fmaussion commented Jan 11, 2019 • edited Loading

rabernat commented Jan 18, 2019

jhamman Jan 18, 2019

Choose a reason for hiding this comment

shoyer commented Jan 21, 2019

rabernat commented Feb 6, 2019

rabernat commented Feb 12, 2019

rabernat commented Feb 12, 2019

kwilcox commented Feb 12, 2019

pep8speaks commented Jan 7, 2019 •

edited

Loading

fmaussion commented Jan 11, 2019 •

edited

Loading