Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved xarray support #2319

Closed
5 of 6 tasks
drs251 opened this issue Feb 8, 2018 · 24 comments
Closed
5 of 6 tasks

Improved xarray support #2319

drs251 opened this issue Feb 8, 2018 · 24 comments

Comments

@drs251
Copy link
Contributor

drs251 commented Feb 8, 2018

I noticed some points concerning xarray support in holoviews which could be improved:

  • Both xarray and holoviews support measurements units, but it appears that they are lost during in- and exporting. I think it would be pretty straightforward to add this feature in the holoviews importing and exporting code, I could look into it.

  • When importing a netcdf file via xarray and passing it to hv.Dataset, the key dimensions are sometimes ordered differently, depending whether the file is imported as a xr.DataArrayor xr.Dataset. This is not a gigantic problem, especially since one can manually specify the key dimensions' order, although it is still pretty annoying when dealing with images...
    EDIT: (actually, this seems to have been fixed already sometime between 1.9.3 and the current master)

  • I've never had any luck converting the data storage backend from 'xarray' to anything else, writing img.clone(datatype=['image']), as shown in the gridded datasets guide: http://holoviews.org/user_guide/Gridded_Datasets.html Perhaps it would be nice to show this more explicitly in the documentation, or fix it if it is a bug.

EDIT:

  • I've noticed that dimensions without coordinates are not accepted by XArrayInterface - although omitting coordinates seems to be the recommended way in xarray to deal with dimensions without meaningful coordinates (such as pixels in an image). In this case, it should be possible to just automatically assign simple range coordinates to these dimensions when making a holoviews Dataset.

  • The treatment of non-dimension coordinates could be more concise. Take for example the "rasm" example that comes with xarray, which looks like this

Coordinates:
  * time     (time) datetime64[ns] 1980-09-16T12:00:00 1980-10-17 ...
    xc       (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ...
    yc       (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ...
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ...

Right now, coordinates are converted into kdims, and dimensions without coordinates are ignored. IMO, it would be better to let the user select if they want (time, y, c) or (time, yc, xc) coordinates when making the holoviews Dataset, where the default should be the proper dimensions.

  • The xarray interface in holoviews could also benefit from more documentation in general - i.e. an example of how to load an example file from xarray into holoviews and make a plot. Introducing xarray support by cloning from a different datatype does not seem very intuitive to me.
@philippjfr
Copy link
Member

I think it would be pretty straightforward to add this feature in the holoviews importing and exporting code, I could look into it.

Agreed this should be pretty straightforward, this is where you'd implement that.

When importing a netcdf file via xarray and passing it to hv.Dataset, the key dimensions are sometimes ordered differently, depending whether the file is imported as a xr.DataArrayor xr.Dataset.

Agreed, this should be consistent and should be handled in the same place as the previous issue.

I've never had any luck converting the data storage backend from 'xarray' to anything else, writing img.clone(datatype=['image']),

Not all conversions are currently supported, generally we only support converting the standard formats, i.e. dictionary based representations, but full ability to convert between all formats would be a good goal to shoot for.

@philippjfr
Copy link
Member

I've never had any luck converting the data storage backend from 'xarray' to anything else

Actually a bit of code in Dataset.clone could probably detect if the current datatype is in the list of datatypes that the user supplied and if not just drop down to a generic format. That will take care of almost all conversions I think.

@drs251
Copy link
Contributor Author

drs251 commented Apr 3, 2018

  • I found another little bug related to xarray importing: When importing and xarray DataArray, it's easy to specify units for the key dimensions, but the same does not work for value dimensions:
import holoviews as hv
import xarray as xr
import numpy as np
hv.notebook_extension()
%opts Image [colorbar=True]
xs = np.linspace(0, 2.5, 51)
ys = np.linspace(0, 1, 21)
zs = np.random.random((21, 51))
arr = xr.DataArray(zs, coords=[("y", ys), ("x", xs)], name="field")
x_dim = hv.Dimension("x", unit="m")
y_dim = hv.Dimension("y", unit="m")
z_dim = hv.Dimension("field", unit="mT")
dataset = hv.Dataset(arr, kdims=[x_dim, y_dim], vdims=[z_dim])
dataset.to(hv.Image)

The unit mT does not show up next to the colorbar.

More generally it would be nice to have improved support for changing dimension names, labels and units, not just for xarray related stuff - the .redim utility seems to be unable to change existing labels and units, which IMO is frustrating sometimes.

@philippjfr
Copy link
Member

It seems like your latest changes to xarray overwrite existing units on the dimensions.

the .redim utility seems to be unable to change existing labels and units, which IMO is frustrating sometimes.

Concrete examples would be appreciated, I'm not currently aware of any issues with .redim atm.

@drs251
Copy link
Contributor Author

drs251 commented Apr 3, 2018

It seems like your latest changes to xarray overwrite existing units on the dimensions.

That's a point I did not think about previously (I'm still using 1.9.5, not sure if it's a good idea to use the current master for everyday work). Ideally, specified key and value dimensions should override anything thats present in the xarray, but I haven't checked if that's the case - I'll look into it, perhaps another PR is needed.

Concrete examples would be appreciated, I'm not currently aware of any issues with .redim atm.

xdim = hv.Dimension("x", label="my x dimension", unit="mJ")
ydim = hv.Dimension("y", label="my y dimension", unit="kB")
zdim = hv.Dimension("z", label="my z dimension", unit="fs")
image = hv.Dataset(([0, 1], [1, 2], [[3, 4], [5, 6]]), kdims=[xdim, ydim], vdims=[zdim]).to(hv.Image)
image = image.redim(z='time')
image = image.redim.unit(x='eV', y='TB')
image = image.redim.label(x='Energy', y='Capacity')

Actually, I saw that my only issue with redim is that it refuses to change existing labels - everything else works like I would expect it to. So as far as I know, if you come across a dimension label you don't like, the only option is to extract the dimension values and rebuild the element from scratch. I would prefer if redim worked the same way on labels as it does on units and dimension names.

@philippjfr
Copy link
Member

Tbh I agree, @jlstevens you originally objected to being able to override the label, but imo this is an annoying and artificial restriction. What was there some convincing reasoning?

@jlstevens
Copy link
Contributor

The name and label define the identity of a dimension (e.g used to compare dimensions). As such the name and label should explicitly be declared correctly.

@philippjfr
Copy link
Member

And using .redim isn't explicit enough? You can already override the dimension as a whole using redim so I really don't see what is gained by disallowing .redim.label except for making things inconsistent and more verbose/difficult for the user.

@jlstevens
Copy link
Contributor

Explicitly declared.

@drs251
Copy link
Contributor Author

drs251 commented Apr 3, 2018

From my perspective, the label is what gets shown in the plot, and name is what you'd use to refer to the dimension programmatically, e.g. when using .reduce. I could understand if name were immutable and you could change around the label as you like, but I don't get why it's the other way around currently.

@jlstevens
Copy link
Contributor

jlstevens commented Apr 3, 2018

I actually agree with that @drs251 . Changing the dimension name was always a misfeature imho and I think it was introduced before label existed. Unfortunately, disabling the ability to change the name would now break backwards compatibility.

@philippjfr
Copy link
Member

Explicitly declared.

My point was that you can already override both name and label using .redim so I really don't see why changing the label should not be supported as well.

@jlstevens
Copy link
Contributor

My point was that you can already override both name and label using .redim so I really don't see why changing the label should not be supported as well.

I suppose so. The intended design is already broken.

@philippjfr
Copy link
Member

philippjfr commented Apr 3, 2018

Changing the dimension name was always a misfeature imho and I think it was introduced before label existed.

It's absolutely not a misfeature as it's crucial for certain operations, e.g. if we ever want to implement a melt operation and it's already used for similar operations in places.

I suppose so. The intended design is already broken.

How can you say that when completely replacing a dimension is an essential operation that absolutely must be supported.

@jlstevens
Copy link
Contributor

How can you say that when completely replacing a dimension is an essential operation that absolutely must be supported.

For operations perhaps, but for changing the title on a plot? All it does is make things more difficult to reason about. Do you really think the semantics of a dimension change because you want a different title?

@philippjfr
Copy link
Member

philippjfr commented Apr 3, 2018

For operations perhaps, but for changing the title on a plot?

For changing the title of a plot axis you indeed shouldn't change the whole dimension, especially the name, which is precisely why disallowing changing the label easily is such a pain. Instead of simply using .redim.label to set a new label a user will either have to replace the entire dimension or simply rename it, which is the worst possible option but also happens to be the most convenient. A user who simply wants to assign a new axis label gets an error when (s)he tries .redim.label(x='My custom label') and simply ends up using .redim(x='My custom label') because it works, but is actually infinitely worse from your perspective, because it replaces the entire dimension instead of just the label.

For operations perhaps

But we don't have distinct mechanisms to do this inside an operation, so are you suggesting introducing a whole other API? There's many reasons to change the names of dimensions so a user should be able to do it somehow that doesn't require knowledge about the type of data, redim has been the API to do that for a long time.

@jlstevens
Copy link
Contributor

Really, the point is that the title should be set as plot option, not by changing the dimension definition. What the dimension is has nothing to do with the title you want. Does your data fundamentally change in semantics because you want to tweak a title?

A user who simply wants to assign a new axis label gets an error when (s)he tries .redim.label(x='My custom label') and simply ends up using .redim(x='My custom label') because it works, but is actually infinitely worse from your perspective.

This I do agree with at least (as I mentioned above) which is a reason to allow the label to be changed. The entire idea of changing the core identity of a dimension to change a title is the real problem here.

@drs251
Copy link
Contributor Author

drs251 commented Apr 4, 2018

  • I also just noticed that the .to interface seems to be broken for datasets generated from xarrays on current master, not sure if it's related to my recent changes (but a test is definitely needed for this).

@philippjfr
Copy link
Member

I also just noticed that the .to interface seems to be broken for datasets generated from xarrays on current master, not sure if it's related to my recent changes (but a test is definitely needed for this).

Could you please provide an example? There are extensive unit tests for .to and .groupby which are all passing. If it is failing that is a critical bug and needs to be fixed before a release tomorrow.

@drs251
Copy link
Contributor Author

drs251 commented Apr 4, 2018

import holoviews as hv
import xarray as xr
import numpy as np

xs = [0.1, 0.2, 0.3]
ys = [0, 1]
zs = np.array([[0, 1], [2, 3], [4, 5]])
da = xr.DataArray(zs, coords=[('x_dim', xs), ('y_dim', ys)], name="data_name", dims=['y_dim', 'x_dim'])
da.attrs['long_name'] = "data long name"
da.attrs['units'] = "array_unit"
da.x_dim.attrs['units'] = "x_unit"
da.y_dim.attrs['long_name'] = "y axis long name"
dataset = hv.Dataset(da)
image = dataset.to(hv.Image)

I get

Traceback (most recent call last):
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 779, in _construct_dataarray
    variable = self._variables[name]
KeyError: 'x'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/danielstephan/python/holoviews/test.py", line 14, in <module>
    image = dataset.to(hv.Image)
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/__init__.py", line 143, in __call__
    element = new_type(selected, **params)
  File "/Users/danielstephan/python/holoviews/holoviews/element/raster.py", line 256, in __init__
    Dataset.__init__(self, data, kdims=kdims, vdims=vdims, extents=extents, **params)
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/__init__.py", line 199, in __init__
    super(Dataset, self).__init__(data, **dict(kwargs, **dict(dims, **extra_kws)))
  File "/Users/danielstephan/python/holoviews/holoviews/element/raster.py", line 50, in __init__
    super(Raster, self).__init__(data, kdims=kdims, vdims=vdims, extents=extents, **params)
  File "/Users/danielstephan/python/holoviews/holoviews/core/dimension.py", line 839, in __init__
    super(Dimensioned, self).__init__(data, **params)
  File "/Users/danielstephan/python/holoviews/holoviews/core/dimension.py", line 560, in __init__
    super(LabelledData, self).__init__(**params)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/parameterized.py", line 1041, in __init__
    self._setup_params(**params)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/parameterized.py", line 978, in override_initialization
    fn(parameterized_instance,*args,**kw)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/parameterized.py", line 1441, in _setup_params
    setattr(self,name,val)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/__init__.py", line 615, in __set__
    super(Number,self).__set__(obj,val)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/__init__.py", line 452, in __set__
    if not obj: self._set_instantiate(dynamic)
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/__init__.py", line 555, in __nonzero__
    return self.interface.nonzero(self)
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/interface.py", line 351, in nonzero
    return bool(cls.length(dataset))
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/xarray.py", line 432, in length
    return np.product([len(dataset.data[d.name]) for d in dataset.kdims])
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/xarray.py", line 432, in <listcomp>
    return np.product([len(dataset.data[d.name]) for d in dataset.kdims])
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 873, in __getitem__
    return self._construct_dataarray(key)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 782, in _construct_dataarray
    self._variables, name, self._level_coords, self.dims)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 72, in _get_virtual_variable
    ref_var = variables[ref_name]
KeyError: 'x'

This does not happen in version 1.9.5

@jlstevens
Copy link
Contributor

This looks like the critical param bug that is in the process of being fixed. This will involve a new param release (hopefully today!) and the fix should simply be to install this new version (which will also be required by 1.10).

@philippjfr
Copy link
Member

Thanks for getting back to us so quickly @drs251, that is indeed related to param. I'm actually very glad to hear that this doesn't affect 1.9.5 because I had thought this was broken for everyone.

@philippjfr
Copy link
Member

I think all the issues here were addressed, if I'm mistaken please reopen a more specific issue.

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants