Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wierd behavior using apply() creating list based on current columns #16321

Closed
STguerin opened this issue May 10, 2017 · 1 comment
Closed

wierd behavior using apply() creating list based on current columns #16321

STguerin opened this issue May 10, 2017 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request

Comments

@STguerin
Copy link

Code Sample, a copy-pastable example if possible

d=[['hello',1,'GOOD','long.kw'],
   [1.2,'chipotle',np.nan,'bingo'],
   ['various',np.nan,3000,123.456]]                                                    
t=pd.DataFrame(data=d, columns=['A','B','C','D']) 
t['combined'] = t.apply(lambda x: list([x['A'], x['B'],  x['C'], x['D']]),axis=1)    

Problem description

[I am confuse why this is not working properly, if I initiate the 'combined' columns first to 0 first, it works. I understand that this is a sub-optimal approach but I am just wondering why is this breaking up]

Expected Output

t['combined'] = t.values.tolist()
t
Out[80]: 
         A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1     1.20  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000   123.46  [various, nan, 3000, 123.456]

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.9.4 boto: 2.45.0 pandas_datareader: None
@TomAugspurger
Copy link
Contributor

See #15628 and issues linking to / from that.

The short-version is DataFrame.apply tries to infer an output based on the result. The result of your output is inferred to be a DataFrame with the same columns.

You're probably better off with something like

In [51]: pd.Series([list(x) for x in t.itertuples(index=False)])
Out[51]:
0        [hello, 1, GOOD, long.kw]
1      [1.2, chipotle, nan, bingo]
2    [various, nan, 3000, 123.456]
dtype: object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants