DataFrame.apply with axis=1 returning (also erroring) different results when returning a list #17970

tdpetrou · 2017-10-25T01:59:09Z

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame(data=np.random.randint(0, 5, (5,3)),
                  columns=['a', 'b', 'c'])
>>> df
   a  b  c
0  4  0  0
1  2  0  1
2  2  2  2
3  1  2  2
4  3  0  0

>>> df.apply(lambda x: list(range(2)), axis=1)  # returns a Series
0    [0, 1]
1    [0, 1]
2    [0, 1]
3    [0, 1]
4    [0, 1]
dtype: object

>>> df.apply(lambda x: list(range(3)), axis=1) # returns a DataFrame
   a  b  c
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2
4  0  1  2

>>> i = 0
>>> def f(x):
        global i
        if i == 0:
            i += 1
            return list(range(3))
        return list(range(4))

>>> df.apply(f, axis=1) 
ValueError: Shape of passed values is (5, 4), indices imply (5, 3)

Problem description

There are three possible outcomes. When the length of the returned list is equal to the number of columns then a DataFrame is returned and each column gets the corresponding value in the list.

If the length of the returned list is not equal to the number of columns, then a Series of lists is returned.

If the length of the returned list equals the number of columns for the first row but has at least one row where the list has a different number of elements than number of columns a ValueError is raised.

Expected Output

Need consistency. Probably should default to a Series of lists for all examples.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0rc1
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0

The text was updated successfully, but these errors were encountered:

jonathanrocher · 2017-11-10T19:49:32Z

The problem is wider. I am running the same bug when running the following

>>> df = DataFrame({"a": [1, 2, 3]})
>>> df.apply(lambda row: np.ones(1), axis=1)
     a
0  1.0
1  1.0
2  1.0
>>> df.apply(lambda row: np.ones(2), axis=1)
ValueError: Shape of passed values is (3, 2), indices imply (3, 1)

Related to #17437 (where there are some comments from @jreback )

jreback · 2017-11-10T20:12:41Z

this is a duplicate of #17437 & #15628.

tdpetrou · 2017-11-10T20:15:09Z

@jreback Do the others cover the three possible outcomes? Its really bizarre behavior.

jreback · 2017-11-10T20:39:42Z

@tdpetrou having lists as elements it the bizarre part. These are not in any way supported. Thus the apply behavior is really undefined. If you want to have a look, go right ahead. This is an edge case which is requires apply to basically guess at user intentions.

closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602

closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602 closes pandas-dev#18775

closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602 closes pandas-dev#18775 closes pandas-dev#18901 closes pandas-dev#18919

jorisvandenbossche · 2018-01-28T19:18:46Z

FYI, this will be fixed in #18577

tdpetrou · 2018-01-28T19:57:06Z

@jorisvandenbossche Personally, I would disallow any complex data structures to be an element in a pandas dataframe, especially if they are not supported

jorisvandenbossche · 2018-01-28T20:00:04Z

You can comment on the PR if you want. But changing that would be a big backwards compatibility break (much bigger than the current PR).
And they are in some way supported, just discouraged.

closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602 closes pandas-dev#18775 closes pandas-dev#18901 closes pandas-dev#18919

closes #16353 closes #17348 closes #17437 closes #18573 closes #17970 closes #17892 closes #17602 closes #18775 closes #18901 closes #18919

…-dev#18577) closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602 closes pandas-dev#18775 closes pandas-dev#18901 closes pandas-dev#18919

jreback closed this as completed Nov 10, 2017

jreback added Apply Apply, Aggregate, Transform, Map Duplicate Report Duplicate issue or pull request labels Nov 10, 2017

jreback added this to the No action milestone Nov 10, 2017

jreback mentioned this issue Nov 30, 2017

API/BUG: .apply will correctly infer output shape when axis=1 #18577

Merged

jreback modified the milestones: No action, 0.22.0 Nov 30, 2017

jorisvandenbossche pushed a commit that referenced this issue Feb 7, 2018

API/BUG: .apply will correctly infer output shape when axis=1 (#18577)

6b0c7e7

closes #16353 closes #17348 closes #17437 closes #18573 closes #17970 closes #17892 closes #17602 closes #18775 closes #18901 closes #18919

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.apply with axis=1 returning (also erroring) different results when returning a list #17970

DataFrame.apply with axis=1 returning (also erroring) different results when returning a list #17970

tdpetrou commented Oct 25, 2017

INSTALLED VERSIONS

jonathanrocher commented Nov 10, 2017

jreback commented Nov 10, 2017

tdpetrou commented Nov 10, 2017

jreback commented Nov 10, 2017 •

edited

Loading

jorisvandenbossche commented Jan 28, 2018

tdpetrou commented Jan 28, 2018

jorisvandenbossche commented Jan 28, 2018

DataFrame.apply with axis=1 returning (also erroring) different results when returning a list #17970

DataFrame.apply with axis=1 returning (also erroring) different results when returning a list #17970

Comments

tdpetrou commented Oct 25, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jonathanrocher commented Nov 10, 2017

jreback commented Nov 10, 2017

tdpetrou commented Nov 10, 2017

jreback commented Nov 10, 2017 • edited Loading

jorisvandenbossche commented Jan 28, 2018

tdpetrou commented Jan 28, 2018

jorisvandenbossche commented Jan 28, 2018

Output of `pd.show_versions()`

jreback commented Nov 10, 2017 •

edited

Loading