-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: SparseDataFrame indexing sometimes loses fill_value
of empty columns in 0.24
#25378
Labels
Comments
FYI we’re likely deprecating SparseDataFrame. You’re probably better off switching to a regular data frame with sparse columns.
… On Feb 19, 2019, at 5:35 PM, Scott Gigante ***@***.***> wrote:
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
X = pd.SparseDataFrame([[0,1], [0,0]], default_fill_value=0.0)
## Good behaviour
X.loc[0].to_numpy()
# array([0., 1.])
X.loc[[0]].to_numpy()
# array([[0., 1.]])
X.iloc[0].to_numpy()
# array([0., 1.])
## Bad behaviour
X.iloc[[0]].to_numpy()
# array([[nan, 1]], dtype=object)
X.loc[[True, False]].to_numpy()
# array([[nan, 1]], dtype=object)
Problem description
Indexing a SparseDataFrame with iloc and more than a single row number should return the same result as indexing the same rows with loc and the corresponding indices. Instead, iloc drops column fill_value for any column with no non-zero entries.
Expected Output
All commands should return array([0., 1.]). The last two (iloc with fancy indexing, and loc with boolean indexing) returns instead array([nan, 1.]).
Output of pd.show_versions()
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I don't see any mention of this in the documentation. Can you please post a link? |
Not deprecated yet: #19239
…On Wed, Feb 20, 2019 at 12:18 PM Scott Gigante ***@***.***> wrote:
I don't see any mention of this in the documentation. Can you please post
a link?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#25378 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHImSaByNWKzAsomtNSFDzpvglO6w-ks5vPZGKgaJpZM4bEF0P>
.
|
gfyoung
added
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Sparse
Sparse Data Type
Bug
labels
Feb 21, 2019
Marking as a bug for now, but given that this isn't a regression, it would likely be patched in |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Code Sample, a copy-pastable example if possible
Problem description
Indexing a SparseDataFrame with
iloc
and more than a single row number should return the same result as indexing the same rows withloc
and the corresponding indices. Instead,iloc
drops columnfill_value
for any column with no non-zero entries.Expected Output
All commands should return
array([0., 1.])
(allowing for differences between 1- and 2-D output.) The last two (iloc
with fancy indexing, andloc
with boolean indexing) returns insteadarray([nan, 1.])
.Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: