BUG: df fails when columns arg is a list containing dupes #2079

ghost · 2012-10-17T05:41:41Z

In [1]: DataFrame(data,columns=["a","a"])

...
pandas/pandas/core/internals.pyc in _stack_dict(dct, ref_items, dtype)
1344 stacked = np.empty(shape, dtype=dtype)
1345 for i, item in enumerate(items):
-> 1346 stacked[i] = _asarray_compat(dct[item])
1347
1348 # stacked = np.vstack([_asarray_compat(dct[k]) for k in items])

IndexError: index out of bounds

5e6db32 is a failing test for this.

it looks like _to_sdict threads down to a call to _convert_object_array which builds a dict
keyed on column names, so dupe columns get squashed and you end up with a mismatch
between the length of the columns arg to df.__init__ and the data.
_to_sdict is not used for ndarrays so this doesn't haoppen, I was able to reuse
_init_ndarray for the case of columns being a flat list and have things work as expected.

still, too much code touching this, better left to the core devs to decide how to handle this.

The text was updated successfully, but these errors were encountered:

wesm · 2012-11-05T01:58:09Z

Fixing this is quite an undertaking since there's a lot of existing constructor code that assumes unique column names. I'm on it; probably get it sorted out over next day or so

ghost · 2012-11-05T07:53:09Z

Should this work?

pd.DataFrame.from_items([('a',['foo']),('a',['bar'])],columns=['a','a'])
Out[6]: 
     a    a
0  bar  bar

ghost · 2012-11-05T07:58:44Z

Also, forgive the nitpick, but since sdict is now abandoned, it would be good to rename the methods
that reference it, enhance readability...

def _list_to_sdict(data, columns, coerce_float=False):
def _list_of_series_to_sdict(data, columns, coerce_float=False):
def _list_of_dict_to_sdict(data, columns, coerce_float=False):

wesm · 2012-11-05T15:23:21Z

Yeah that should work.

ghost mentioned this issue Oct 17, 2012

TST: DF should not error when 'columns' is a list that contains dupes #2036

Closed

wesm closed this as completed in b1b85ae Nov 5, 2012

wesm reopened this Nov 5, 2012

wesm added a commit that referenced this issue Nov 5, 2012

BUG: more DataFrame constructor refactoring for duplicate columns. #2079

8842a06

wesm closed this as completed Nov 5, 2012

reidy-p mentioned this issue Jan 25, 2018

DEPR: Deprecate from_items #18529

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: df fails when columns arg is a list containing dupes #2079

BUG: df fails when columns arg is a list containing dupes #2079

ghost commented Oct 17, 2012

wesm commented Nov 5, 2012

ghost commented Nov 5, 2012

ghost commented Nov 5, 2012

wesm commented Nov 5, 2012

BUG: df fails when columns arg is a list containing dupes #2079

BUG: df fails when columns arg is a list containing dupes #2079

Comments

ghost commented Oct 17, 2012

wesm commented Nov 5, 2012

ghost commented Nov 5, 2012

ghost commented Nov 5, 2012

wesm commented Nov 5, 2012