Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse dataframe fails when astype() is called with dictionary #26227

Closed
stefansimik opened this issue Apr 27, 2019 · 4 comments · Fixed by #28425
Closed

Sparse dataframe fails when astype() is called with dictionary #26227

stefansimik opened this issue Apr 27, 2019 · 4 comments · Fixed by #28425
Labels
Dtype Conversions Unexpected or buggy dtype conversions Sparse Sparse Data Type

Comments

@stefansimik
Copy link

stefansimik commented Apr 27, 2019

Code Sample, a copy-pastable example if possible

# Create DataFrame
df = pd.DataFrame({'A': [1.0, 2.0, np.nan, 3.0], 'B': [100, 200, 300, 400]})

# Conversion mapping
convert_cols_dict = {'A': np.float32, 'B': np.int32}

# WORKS FINE:  Conversion on dense-dataframe works well
df_optimized = df.astype(convert_cols_dict)

# FAILS : Conversion on sparse_dataframe fails on KeyError exception
df_sparse = df.to_sparse()
df_sparse_optimized = df_sparse.astype(convert_cols_dict)

Problem description

Problem is, that one cannot convert column types when working with sparse-dataframe.
But calling .astype(..) function fails with this error:

---------------------------------------
KeyErrorTraceback (most recent call last)
<ipython-input-62-fcedc5114473> in <module>
      1 # Conversion on sparse_dataframe fails
      2 df_sparse = df.to_sparse()
----> 3 df_sparse_optimized = df_sparse.astype(convert_cols_dict)
      4 df_sparse_optimized

D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in astype(self, dtype)
    349 
    350     def astype(self, dtype):
--> 351         return self._apply_columns(lambda x: x.astype(dtype))
    352 
    353     def copy(self, deep=True):

D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in _apply_columns(self, func)
    342 
    343         new_data = {col: func(series)
--> 344                     for col, series in compat.iteritems(self)}
    345 
    346         return self._constructor(

D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in <dictcomp>(.0)
    342 
    343         new_data = {col: func(series)
--> 344                     for col, series in compat.iteritems(self)}
    345 
    346         return self._constructor(

D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in <lambda>(x)
    349 
    350     def astype(self, dtype):
--> 351         return self._apply_columns(lambda x: x.astype(dtype))
    352 
    353     def copy(self, deep=True):

D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
   5659             if self.ndim == 1:  # i.e. Series
   5660                 if len(dtype) > 1 or self.name not in dtype:
-> 5661                     raise KeyError('Only the Series name can be used for '
   5662                                    'the key in Series dtype mappings.')
   5663                 new_type = dtype[self.name]

KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'

Expected Output

It is expected, that what works on dense-dataframe, should also work on sparse-dataframe.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: 19.0.1
setuptools: 40.8.0
Cython: None
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@WillAyd WillAyd added Dtype Conversions Unexpected or buggy dtype conversions Sparse Sparse Data Type labels Apr 29, 2019
@simonjayhawkins
Copy link
Member

@stefansimik Thanks for the report.

after having a quick look at allowing SparseDataFrame.astype to work with a dict, the code sample raises

ValueError: Cannot convert non-finite values (NA or inf) to integer

which i believe is a separate issue, xref #25694.

It is expected, that what works on dense-dataframe, should also work on sparse-dataframe.

This is not the case at the moment, SparseDataFrame.astype is implemented independently from DataFrame.astype (which is inherited directly from NDFrame.astype).

SparseDataFrame.astype does not have the same signature and does not support copy and errors arguments or the ability to pass kwargs to the constructor either.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 29, 2019

I'd recommend using a DataFrame with sparse values, rather than a SparseDataFrame. Then a regular DataFrame.astype should work.

@stefansimik
Copy link
Author

stefansimik commented Apr 29, 2019

Thanks @TomAugspurger
Can you please show a few lines of code, what you mean by 'using a DataFrame with sparse values'?
Or do you have any useful links, that elaborate more deeply about using 'sparse' structures in practice?

It is quite a big problem to find good explanations and practical examples about how we are expected to use pandas sparse structures. Pandas docs contains just basic content, which is possibly obsolete, as I read around here, that SparseDataFrame is going to be deprecated and to use sparse-dataframes is not recommended approach for the near future...

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 29, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants