BUG: SparseDataFrame indexing sometimes loses `fill_value` of empty columns in 0.24 #25378

scottgigante · 2019-02-19T23:34:57Z

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
X = pd.SparseDataFrame([[0,1], [0,0]], default_fill_value=0.0)
## Good behaviour
X.loc[0].to_numpy()
# array([0., 1.])
X.loc[[0]].to_numpy()
# array([[0., 1.]])
X.iloc[0].to_numpy()
# array([0., 1.])
## Bad behaviour
X.iloc[[0]].to_numpy()
# array([[nan, 1]], dtype=object)
X.loc[[True, False]].to_numpy()
# array([[nan, 1]], dtype=object)

Problem description

Indexing a SparseDataFrame with iloc and more than a single row number should return the same result as indexing the same rows with loc and the corresponding indices. Instead, iloc drops column fill_value for any column with no non-zero entries.

Expected Output

All commands should return array([0., 1.]) (allowing for differences between 1- and 2-D output.) The last two (iloc with fancy indexing, and loc with boolean indexing) returns instead array([nan, 1.]).

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-17763-Microsoft
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.1
pytest: None
pip: 18.0
setuptools: 40.2.0
Cython: 0.29
numpy: 1.15.1
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: 1.6.7
patsy: None
dateutil: 2.7.3
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-02-20T00:57:23Z

FYI we’re likely deprecating SparseDataFrame. You’re probably better off switching to a regular data frame with sparse columns.

…

On Feb 19, 2019, at 5:35 PM, Scott Gigante ***@***.***> wrote: Code Sample, a copy-pastable example if possible import numpy as np import pandas as pd X = pd.SparseDataFrame([[0,1], [0,0]], default_fill_value=0.0) ## Good behaviour X.loc[0].to_numpy() # array([0., 1.]) X.loc[[0]].to_numpy() # array([[0., 1.]]) X.iloc[0].to_numpy() # array([0., 1.]) ## Bad behaviour X.iloc[[0]].to_numpy() # array([[nan, 1]], dtype=object) X.loc[[True, False]].to_numpy() # array([[nan, 1]], dtype=object) Problem description Indexing a SparseDataFrame with iloc and more than a single row number should return the same result as indexing the same rows with loc and the corresponding indices. Instead, iloc drops column fill_value for any column with no non-zero entries. Expected Output All commands should return array([0., 1.]). The last two (iloc with fancy indexing, and loc with boolean indexing) returns instead array([nan, 1.]). Output of pd.show_versions() — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

scottgigante · 2019-02-20T18:18:45Z

I don't see any mention of this in the documentation. Can you please post a link?

TomAugspurger · 2019-02-20T18:42:23Z

Not deprecated yet: #19239

…

On Wed, Feb 20, 2019 at 12:18 PM Scott Gigante ***@***.***> wrote: I don't see any mention of this in the documentation. Can you please post a link? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#25378 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHImSaByNWKzAsomtNSFDzpvglO6w-ks5vPZGKgaJpZM4bEF0P> .

gfyoung · 2019-02-21T08:25:31Z

Marking as a bug for now, but given that this isn't a regression, it would likely be patched in 0.25.0. However, even with the deprecation, a patch would be welcomed if it isn't too difficult.

gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Sparse Sparse Data Type Bug labels Feb 21, 2019

gfyoung added this to the Contributions Welcome milestone Feb 21, 2019

TomAugspurger mentioned this issue Sep 16, 2019

Remove SparseSeries and SparseDataFrame #28425

Merged

TomAugspurger closed this as completed in #28425 Sep 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: SparseDataFrame indexing sometimes loses `fill_value` of empty columns in 0.24 #25378

BUG: SparseDataFrame indexing sometimes loses `fill_value` of empty columns in 0.24 #25378

scottgigante commented Feb 19, 2019 •

edited

Loading

TomAugspurger commented Feb 20, 2019 via email

scottgigante commented Feb 20, 2019

TomAugspurger commented Feb 20, 2019 via email

gfyoung commented Feb 21, 2019

BUG: SparseDataFrame indexing sometimes loses fill_value of empty columns in 0.24 #25378

BUG: SparseDataFrame indexing sometimes loses fill_value of empty columns in 0.24 #25378

Comments

scottgigante commented Feb 19, 2019 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

TomAugspurger commented Feb 20, 2019 via email

scottgigante commented Feb 20, 2019

TomAugspurger commented Feb 20, 2019 via email

gfyoung commented Feb 21, 2019

BUG: SparseDataFrame indexing sometimes loses `fill_value` of empty columns in 0.24 #25378

BUG: SparseDataFrame indexing sometimes loses `fill_value` of empty columns in 0.24 #25378

scottgigante commented Feb 19, 2019 •

edited

Loading

Output of `pd.show_versions()`