-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame with MultiIndex -> xarray with sparse array #3206
Labels
topic-arrays
related to flexible array support
Comments
shoyer
changed the title
MultiIndex -> sparse array
DataFrame with MultiIndex -> xarray with sparse array
Aug 12, 2019
shoyer
added a commit
to shoyer/xarray
that referenced
this issue
Aug 13, 2019
Fixes pydata#3206 Example usage: In [3]: import pandas as pd ...: import numpy as np ...: import xarray ...: df = pd.DataFrame({ ...: 'w': range(10), ...: 'x': list('abcdefghij'), ...: 'y': np.arange(0, 100, 10), ...: 'z': np.ones(10), ...: }).set_index(['w', 'x', 'y']) ...: In [4]: ds = xarray.Dataset.from_dataframe(df, sparse=True) In [5]: ds.z.data Out[5]: <COO: shape=(10, 10, 10), dtype=float64, nnz=10, fill_value=nan>
This was referenced Aug 13, 2019
It would be great to have |
crusaderky
pushed a commit
that referenced
this issue
Aug 27, 2019
sparse=True option for from_dataframe and from_series Fixes #3206
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Now that we have preliminary support for sparse arrays in xarray, one really cool feature we could explore is creating sparse arrays from MultiIndexed pandas DataFrames.
Right now, xarray's methods for creating objects from pandas always create dense arrays, but the size of these dense arrays can get big really quickly if the MultiIndex is sparsely populated, e.g.,
This length 10 DataFrame turned into a dense array with 1000 elements (only 10 of which are not NaN):
We can imagine
xarray.Dataset.from_dataframe(df, sparse=True)
would make the same Dataset, but with sparse array (with aNaN
fill value) instead of dense arrays.Once sparse arrays work pretty well, this could actually obviate most of the use cases for
MultiIndex
in arrays. Arguably the model is quite a bit cleaner.The text was updated successfully, but these errors were encountered: