Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Assigning values in SparseDataFrame with duplicate columns fails #14427

Closed
bkandel opened this issue Oct 14, 2016 · 5 comments · Fixed by #28425
Closed

BUG: Assigning values in SparseDataFrame with duplicate columns fails #14427

bkandel opened this issue Oct 14, 2016 · 5 comments · Fixed by #28425
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type

Comments

@bkandel
Copy link
Contributor

bkandel commented Oct 14, 2016

As discussed in #14384 (comment).

A small, complete example of the issue

import pandas as pd 
df1 = pd.DataFrame({'a': [1, 2, 3]})
df2 = pd.DataFrame({'b': [2,3,4]})
df = pd.concat([df1, df1, df2], axis=1).to_sparse()
df.index = [1, 2, 3]
df.loc[1, 'a'] = 3

errors with

AttributeError                            Traceback (most recent call last)
<ipython-input-6-0a670748626a> in <module>()
      4 df = pd.concat([df1, df1, df2], axis=1).to_sparse()
      5 df.index = [1, 2, 3]
----> 6 df.loc[1, 'a'] = 3

/Users/bkandel/.virtualenvs/pandas_19/lib/python2.7/site-packages/pandas/core/indexing.pyc in __setitem__(self, key, value)
    138             key = com._apply_if_callable(key, self.obj)
    139         indexer = self._get_setitem_indexer(key)
--> 140         self._setitem_with_indexer(indexer, value)
    141 
    142     def _has_valid_type(self, k, axis):

/Users/bkandel/.virtualenvs/pandas_19/lib/python2.7/site-packages/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value)
    545                 # scalar
    546                 for item in labels:
--> 547                     setter(item, value)
    548 
    549         else:

/Users/bkandel/.virtualenvs/pandas_19/lib/python2.7/site-packages/pandas/core/indexing.pyc in setter(item, v)
    453 
    454             def setter(item, v):
--> 455                 s = self.obj[item]
    456                 pi = plane_indexer[0] if lplane_indexer == 1 else plane_indexer
    457 

/Users/bkandel/.virtualenvs/pandas_19/lib/python2.7/site-packages/pandas/sparse/frame.pyc in __getitem__(self, key)
    345             return self._getitem_array(key)
    346         else:
--> 347             return self._get_item_cache(key)
    348 
    349     @Appender(DataFrame.get_value.__doc__, indents=0)

/Users/bkandel/.virtualenvs/pandas_19/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1385         if res is None:
   1386             values = self._data.get(item)
-> 1387             res = self._box_item_values(item, values)
   1388             cache[item] = res
   1389             res._set_as_cached(item, self)

/Users/bkandel/.virtualenvs/pandas_19/lib/python2.7/site-packages/pandas/core/frame.pyc in _box_item_values(self, key, values)
   2392         items = self.columns[self.columns.get_loc(key)]
   2393         if values.ndim == 2:
-> 2394             return self._constructor(values.T, columns=items, index=self.index)
   2395         else:
   2396             return self._box_col_values(values, items)

AttributeError: 'BlockManager' object has no attribute 'T'

Expected Output

   a  a  b
1  3  3  2
2  2  2  3
3  3  3  4

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.0
nose: None
pip: 8.1.2
setuptools: 28.3.0
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Oct 14, 2016

this is not tested at all. I suppose could be supported.

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type Difficulty Intermediate labels Oct 14, 2016
@jreback jreback added this to the Next Major Release milestone Oct 14, 2016
@jreback
Copy link
Contributor

jreback commented Oct 14, 2016

will take community contributions.

@gfyoung
Copy link
Member

gfyoung commented Oct 18, 2016

@jreback : Are there really no tests for indexing into a SparseDataFrame? That's surprising.

The bug essentially boils down to the fact that you cannot take the transpose of a BlockManager compared to a DataFrame, which is what you get when you do df['a'] assuming df = df = pd.concat([df1, df1, df2], axis=1) instead of df = pd.concat([df1, df1, df2], axis=1).to_sparse().

But how do you effectively implement transpose for BlockManager, especially since this is an internal structure for all other pandas objects.

@jreback
Copy link
Contributor

jreback commented Oct 18, 2016

https://github.com/pandas-dev/pandas/blob/master/pandas/sparse/tests/test_indexing.py

setting is tested very little with sparse
it's pretty expensive to do generally if u r not setting an already sparse value (not the testing by the actual operation)

@gfyoung
Copy link
Member

gfyoung commented Oct 18, 2016

@jreback : Fair enough, but the issue is not even setting a value. It's just indexing (i.e. the getter) that is breaking, which is even more problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants