You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
UniformNdMapping types (such as NdOverlay, HoloMap, NdLayout and GridSpace) wrap around one or more Elements adding additional outer indices to the data. This means that at least in theory they can always be reduced to a single Element containing the union of dimensions between the container and the element. The current API to do this is to call the table method. As a very simple example let's take three curves, and collapse them using the table method:
Groups x y
0 A 0 0
1 A 1 1
2 A 2 2
0 B 0 0
1 B 1 1
2 B 2 2
0 C 0 0
1 C 1 1
2 C 2 2
Using the table method to collapse the data in this way can be useful but since a Table is usually columnar this does not make much sense for gridded data. Secondly we also have a collapse method on HoloMap, which first combines the data by generating a table like the one above and then applies some aggregation. This approach is both highly inefficient when dealing with gridded data and also incorrect because once an image has been converted to tabular format it can't easily be converted back.
Therefore my proposal would be that we provide some API that allows combining both tabular and gridded datasets correctly without always converting to a tabular format. In practical terms this just means that we implement Interface.concat methods for the gridded interfaces and then replace the table method with a more general .to_dataset or similar. This will allow collapsing a HoloMap/GridSpace/NdOverlay of Images/Rasters etc. into an n-dimensional cube. Once that's implemented HoloMap.collapse will just work for gridded datasets again.
Another way of looking at this is as the reverse operation to a .groupby or .to conversion, i.e. a multi-dimensional dataset can be expanded into multiple elements in a container type, and .to_dataset would do the reverse and collapse into down into a single Dataset again.
I believe this could also be leveraged to an efficient storage protocol, complex containers could be collapsed down into a monolithic datastore, representing a large table or multi-dimensional array, on serialization. The collapsed data can then be stored along with a spec (recall my prototype for expressing .to specifications) to recreate the complex container on deserialization. That way a serialization tool could take advantage of large datastores such as a database, pytables or NetCDF by collapsing the data down to a single datastructure which can be stored efficiently.
The text was updated successfully, but these errors were encountered:
UniformNdMapping types (such as NdOverlay, HoloMap, NdLayout and GridSpace) wrap around one or more Elements adding additional outer indices to the data. This means that at least in theory they can always be reduced to a single Element containing the union of dimensions between the container and the element. The current API to do this is to call the
table
method. As a very simple example let's take three curves, and collapse them using the table method:Using the table method to collapse the data in this way can be useful but since a
Table
is usually columnar this does not make much sense for gridded data. Secondly we also have a collapse method onHoloMap
, which first combines the data by generating a table like the one above and then applies some aggregation. This approach is both highly inefficient when dealing with gridded data and also incorrect because once an image has been converted to tabular format it can't easily be converted back.Therefore my proposal would be that we provide some API that allows combining both tabular and gridded datasets correctly without always converting to a tabular format. In practical terms this just means that we implement
Interface.concat
methods for the gridded interfaces and then replace thetable
method with a more general.to_dataset
or similar. This will allow collapsing a HoloMap/GridSpace/NdOverlay of Images/Rasters etc. into an n-dimensional cube. Once that's implementedHoloMap.collapse
will just work for gridded datasets again.Another way of looking at this is as the reverse operation to a
.groupby
or.to
conversion, i.e. a multi-dimensional dataset can be expanded into multiple elements in a container type, and.to_dataset
would do the reverse and collapse into down into a single Dataset again.I believe this could also be leveraged to an efficient storage protocol, complex containers could be collapsed down into a monolithic datastore, representing a large table or multi-dimensional array, on serialization. The collapsed data can then be stored along with a spec (recall my prototype for expressing
.to
specifications) to recreate the complex container on deserialization. That way a serialization tool could take advantage of large datastores such as a database, pytables or NetCDF by collapsing the data down to a single datastructure which can be stored efficiently.The text was updated successfully, but these errors were encountered: