Skip to content

Commit

Permalink
XArrayInterface improvements: dimension units and labels (#2431)
Browse files Browse the repository at this point in the history
  • Loading branch information
drs251 authored and philippjfr committed Mar 15, 2018
1 parent 543983b commit 41e02f1
Show file tree
Hide file tree
Showing 3 changed files with 129 additions and 2 deletions.
78 changes: 77 additions & 1 deletion examples/user_guide/08-Gridded_Datasets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,69 @@
"heatmap + heatmap.table()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Working with xarray data types\n",
"As demonstrated previously, `Dataset` comes with support for the `xarray` library, which offers a powerful way to work with multi-dimensional, regularly spaced data. In this example, we'll load an example dataset, turn it into a HoloViews `Dataset` and visualize it. First, let's have a look at the xarray dataset's contents:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"xr_ds = xr.tutorial.load_dataset(\"air_temperature\")\n",
"xr_ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is trivial to turn this xarray Dataset into a Holoviews `Dataset` (the same also works for DataArray):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hv_ds = hv.Dataset(xr_ds)[:, :, \"2013-01-01\"]\n",
"print(hv_ds)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have used the usual slice notation in order to select one single day in the rather large dataset. Finally, let's visualize the dataset by converting it to a `HoloMap` of `Images` using the `to()` method. We need to specify which of the dataset's key dimensions will be consumed by the images (in this case \"lat\" and \"lon\"), where the remaing key dimensions will be associated with the HoloMap (here: \"time\"). We'll use the slice notation again to clip the longitude."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%opts Image [colorbar=True]\n",
"%%output size=200\n",
"hv_ds.to(hv.Image, kdims=[\"lon\", \"lat\"], dynamic=False)[:, 220:320, :]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we have explicitly specified the default behaviour `dynamic=False`, which returns a HoloMap. Note, that this approach immediately converts all available data to images, which will take up a lot of RAM for large datasets. For these situations, use `dynamic=True` to generate a [DynamicMap](./06-Live_Data.ipynb) instead. Additionally, [xarray features dask support](http://xarray.pydata.org/en/stable/dask.html), which is helpful when dealing with large amounts of data.\n",
"\n",
"Additional examples of visualizing xarrays in the context of geographical data can be found in the GeoViews documentation: [Gridded Datasets I](http://geo.holoviews.org/Gridded_Datasets_I.html) and\n",
"[Gridded Datasets II](http://geo.holoviews.org/Gridded_Datasets_II.html). These guides also contain useful information on the interaction between xarray data structures and HoloViews Datasets in general."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -399,9 +462,22 @@
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"pygments_lexer": "ipython3"
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
Expand Down
12 changes: 11 additions & 1 deletion holoviews/core/data/xarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ def init(cls, eltype, data, kdims, vdims):
cls)
vdims = [vdim]
data = data.to_dataset(name=vdim.name)

if not isinstance(data, xr.Dataset):
if kdims is None:
kdims = kdim_param.default
Expand Down Expand Up @@ -120,8 +121,9 @@ def init(cls, eltype, data, kdims, vdims):
for c in data.coords:
if c not in kdims and set(data[c].dims) == set(virtual_dims):
kdims.append(c)
vdims = [vd if isinstance(vd, Dimension) else Dimension(vd) for vd in vdims]
kdims = [kd if isinstance(kd, Dimension) else Dimension(kd) for kd in kdims]

kdims = [d if isinstance(d, Dimension) else Dimension(d) for d in kdims]
not_found = []
for d in kdims:
if not any(d.name == k or (isinstance(v, xr.DataArray) and d.name in v.dims)
Expand All @@ -133,6 +135,14 @@ def init(cls, eltype, data, kdims, vdims):
raise DataError("xarray Dataset must define coordinates "
"for all defined kdims, %s coordinates not found."
% not_found, cls)

# retrieve units and labels from Dataset:
for d in kdims + vdims:
d.unit = data[d.name].attrs.get('units')
label = data[d.name].attrs.get('long_name')
if label is not None:
d.label = label

return data, {'kdims': kdims, 'vdims': vdims}, {}


Expand Down
41 changes: 41 additions & 0 deletions tests/core/data/testdataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1753,6 +1753,47 @@ def test_xarray_dataset_with_scalar_dim_canonicalize(self):
expected = np.array([[0, 1], [2, 3], [4, 5]])
self.assertEqual(canonical, expected)

def test_xarray_dataset_names_and_units(self):
import xarray as xr
xs = [0.1, 0.2, 0.3]
ys = [0, 1]
zs = np.array([[0, 1], [2, 3], [4, 5]])
da = xr.DataArray(zs, coords=[('x_dim', xs), ('y_dim', ys)], name="data_name", dims=['y_dim', 'x_dim'])
da.attrs['long_name'] = "data long name"
da.attrs['units'] = "array_unit"
da.x_dim.attrs['units'] = "x_unit"
da.y_dim.attrs['long_name'] = "y axis long name"
dataset = Dataset(da)
self.assertEqual(dataset.get_dimension("x_dim"), Dimension("x_dim", unit="x_unit"))
self.assertEqual(dataset.get_dimension("y_dim"), Dimension("y_dim", label="y axis long name"))
self.assertEqual(dataset.get_dimension("data_name"),
Dimension("data_name", label="data long name", unit="array_unit"))

def test_xarray_dataset_dataarray_vs_dataset(self):
import xarray as xr
xs = [0.1, 0.2, 0.3]
ys = [0, 1]
zs = np.array([[0, 1], [2, 3], [4, 5]])
da = xr.DataArray(zs, coords=[('x_dim', xs), ('y_dim', ys)], name="data_name", dims=['y_dim', 'x_dim'])
da.attrs['long_name'] = "data long name"
da.attrs['units'] = "array_unit"
da.x_dim.attrs['units'] = "x_unit"
da.y_dim.attrs['long_name'] = "y axis long name"
ds = da.to_dataset()
dataset_from_da = Dataset(da)
dataset_from_ds = Dataset(ds)
self.assertEqual(dataset_from_da, dataset_from_ds)
# same with reversed names:
da_rev = xr.DataArray(zs, coords=[('x_dim', xs), ('y_dim', ys)], name="data_name", dims=['x_dim', 'y_dim'])
da_rev.attrs['long_name'] = "data long name"
da_rev.attrs['units'] = "array_unit"
da_rev.x_dim.attrs['units'] = "x_unit"
da_rev.y_dim.attrs['long_name'] = "y axis long name"
ds_rev = da_rev.to_dataset()
dataset_from_da_rev = Dataset(da_rev)
dataset_from_ds_rev = Dataset(ds_rev)
self.assertEqual(dataset_from_da_rev, dataset_from_ds_rev)

def test_dataset_array_init_hm(self):
"Tests support for arrays (homogeneous)"
raise SkipTest("Not supported")
Expand Down

0 comments on commit 41e02f1

Please sign in to comment.