wierd behavior using apply() creating list based on current columns #16321

STguerin · 2017-05-10T17:04:30Z

Code Sample, a copy-pastable example if possible

d=[['hello',1,'GOOD','long.kw'],
   [1.2,'chipotle',np.nan,'bingo'],
   ['various',np.nan,3000,123.456]]                                                    
t=pd.DataFrame(data=d, columns=['A','B','C','D']) 
t['combined'] = t.apply(lambda x: list([x['A'], x['B'],  x['C'], x['D']]),axis=1)

Problem description

[I am confuse why this is not working properly, if I initiate the 'combined' columns first to 0 first, it works. I understand that this is a sub-optimal approach but I am just wondering why is this breaking up]

Expected Output

t['combined'] = t.values.tolist()
t
Out[80]: 
         A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1     1.20  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000   123.46  [various, nan, 3000, 123.456]

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.9.4 boto: 2.45.0 pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2017-05-10T18:29:08Z

See #15628 and issues linking to / from that.

The short-version is DataFrame.apply tries to infer an output based on the result. The result of your output is inferred to be a DataFrame with the same columns.

You're probably better off with something like

In [51]: pd.Series([list(x) for x in t.itertuples(index=False)])
Out[51]:
0        [hello, 1, GOOD, long.kw]
1      [1.2, chipotle, nan, bingo]
2    [various, nan, 3000, 123.456]
dtype: object

TomAugspurger closed this as completed May 10, 2017

TomAugspurger added the Duplicate Report Duplicate issue or pull request label May 10, 2017

TomAugspurger added this to the No action milestone May 10, 2017

jreback mentioned this issue Aug 28, 2017

Strange behaviour when trying to create a series from two columns of a dataframe with apply(tuple, axis=1) #17348

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wierd behavior using apply() creating list based on current columns #16321

wierd behavior using apply() creating list based on current columns #16321

STguerin commented May 10, 2017

TomAugspurger commented May 10, 2017

wierd behavior using apply() creating list based on current columns #16321

wierd behavior using apply() creating list based on current columns #16321

Comments

STguerin commented May 10, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

TomAugspurger commented May 10, 2017

Output of `pd.show_versions()`