Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow-ups on MultIndex support #719

Closed
6 of 7 tasks
shoyer opened this issue Jan 17, 2016 · 7 comments
Closed
6 of 7 tasks

Follow-ups on MultIndex support #719

shoyer opened this issue Jan 17, 2016 · 7 comments
Labels

Comments

@shoyer
Copy link
Member

shoyer commented Jan 17, 2016

xref #702

  • Serialization to NetCDF
  • Better repr, showing level names/dtypes?
  • Indexing a scalar at a particular level should drop that level from the MultiIndex (MultiIndex and data selection #767)
  • Make levels accessible as coordinate variables (e.g., ds['time'] can pull out the 'time' level of a multi-index)
  • Support indexing with levels, e.g., ds.sel(time='2000-01').
  • Make isel_points/sel_points return objects with a MultiIndex? (probably after the previous TODO, so we can preserve basic backwards compatibility) (differed until we figure out Indexing with alignment and broadcasting #974)
  • Add set_index/reset_index/swaplevel to make it easier to create and manipulate multi-indexes
@benbovy
Copy link
Member

benbovy commented Mar 25, 2016

Thinking about serialization, a possible solution would be splitting the multi-index into separate coordinates for the same dimension, then assign some specific attributes (e.g., xarray_idx_name, xarray_idx_level) to each of these coordinates so that it is possible to further rebuild the multi-index.

More generally, a couple of methods acting on coordinates only can be a complement to stack/unstack methods (acting on both coordinates and dimensions). Merge/split xarray coordinates into/from a multi-index seem very straightforward to implement.

Similarly, a better repr would represent levels as sub-coordinates:

<xarray.DataArray (dim_0: 3)>
array([0, 1, 2])
Coordinates:
  * dim_0    (dim_0) object
        level_0 (0) object 'foo' 'foo' 'bar'
        level_1 (1) int64 0 1 2

Any thoughts on this (I can start PRs)?

@shoyer
Copy link
Member Author

shoyer commented Mar 25, 2016

Thinking about serialization, a possible solution would be splitting the multi-index into separate coordinates for the same dimension, then assign some specific attributes (e.g., xarray_idx_name, xarray_idx_level) to each of these coordinates so that it is possible to further rebuild the multi-index.

I think index_level would suffice for the serialization, unless we're interested in supporting serializing multiple multi-indexes along a dimension. I don't see much use for that, so I think it's fine to only support serializing mulit-indexes that are actually used as an index.

More generally, a couple of methods acting on coordinates only can be a complement to stack/unstack methods (acting on both coordinates and dimensions). Merge/split xarray coordinates into/from a multi-index seem very straightforward to implement.

Agreed. I would suggest calling these set_index and reset_index, mirroring pandas, unless the API ends up differing enough that this would be confusing.

Similarly, a better repr would represent levels as sub-coordinates

Yes, also agreed! Here are a few ideas for the repr:

Coordinates:
  * dim_0       (dim_0) MultiIndex
    - level_0   object 'foo' 'foo' 'bar'
    - level_1   int64 0 1 2
  • maybe use another indicator (like -) for indenting sublevels like shown above
  • consider writing MultiIndex as "dtype" for the upper level. Or maybe keep object as the dtype but write MultiIndex for the values.
  • I don't see much use in showing the integer index level in the repr. We could consider showing the number in the MultiIndex sublevel, though I can see that getting confusing (especially because the number elements per level isn't reduced when doing indexing).

@benbovy
Copy link
Member

benbovy commented Mar 25, 2016

Relevant comments!

Agreed. I would suggest calling these set_index and reset_index, mirroring pandas, unless the API ends up differing enough that this would be confusing.

OK! I didn't make the link.

@shoyer
Copy link
Member Author

shoyer commented Jul 31, 2016

It might be worth considering making the MultiIndex levels the primary coordinates, and making the multi-index itself the "virtual" coordinate that only appears when indexed. This would suggest a repr like:

Coordinates:
  * level_0   (dim_0) object 'foo' 'foo' 'bar'
  * level_1   (dim_0) int64 0 1 2

This probably wouldn't make sense until we have support for most of the other API niceties on this list (including indexing with .sel, which I just added).

@benbovy
Copy link
Member

benbovy commented Oct 20, 2016

@shoyer it would be nice if we can close this issue before the next major release (0.9.0). There is only a few (though important) issues to fix in #1028, and right after it is merged I can open a new PR for serialization to NetCDF, for which I don't see any big issue.

Given the new indexing behavior you suggest in #974, I guess that having MultiIndex support for sel_points / sel_points wouldn't be needed anymore, although MultiIndex interaction with this alternative indexing behavior must be addressed (but maybe later).

@shoyer
Copy link
Member Author

shoyer commented Oct 20, 2016

@benbovy Agreed. My thought is that 0.9.0 should focus on index improvements -- finishing up optional indexes and multi-index support.

I crossed sel_points off the TODO list.

@stale
Copy link

stale bot commented Jan 24, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jan 24, 2019
@stale stale bot closed this as completed Feb 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants