Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.apply with axis=1 returning (also erroring) different results when returning a list #17970

Closed
tdpetrou opened this issue Oct 25, 2017 · 7 comments · Fixed by #18577
Closed
Labels
Apply Apply, Aggregate, Transform, Map Duplicate Report Duplicate issue or pull request
Milestone

Comments

@tdpetrou
Copy link
Contributor

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame(data=np.random.randint(0, 5, (5,3)),
                  columns=['a', 'b', 'c'])
>>> df
   a  b  c
0  4  0  0
1  2  0  1
2  2  2  2
3  1  2  2
4  3  0  0

>>> df.apply(lambda x: list(range(2)), axis=1)  # returns a Series
0    [0, 1]
1    [0, 1]
2    [0, 1]
3    [0, 1]
4    [0, 1]
dtype: object

>>> df.apply(lambda x: list(range(3)), axis=1) # returns a DataFrame
   a  b  c
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2
4  0  1  2

>>> i = 0
>>> def f(x):
        global i
        if i == 0:
            i += 1
            return list(range(3))
        return list(range(4))

>>> df.apply(f, axis=1) 
ValueError: Shape of passed values is (5, 4), indices imply (5, 3)

Problem description

There are three possible outcomes. When the length of the returned list is equal to the number of columns then a DataFrame is returned and each column gets the corresponding value in the list.

If the length of the returned list is not equal to the number of columns, then a Series of lists is returned.

If the length of the returned list equals the number of columns for the first row but has at least one row where the list has a different number of elements than number of columns a ValueError is raised.

Expected Output

Need consistency. Probably should default to a Series of lists for all examples.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0rc1
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0

@jonathanrocher
Copy link

The problem is wider. I am running the same bug when running the following

>>> df = DataFrame({"a": [1, 2, 3]})
>>> df.apply(lambda row: np.ones(1), axis=1)
     a
0  1.0
1  1.0
2  1.0
>>> df.apply(lambda row: np.ones(2), axis=1)
ValueError: Shape of passed values is (3, 2), indices imply (3, 1)

Related to #17437 (where there are some comments from @jreback )

@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

this is a duplicate of #17437 & #15628.

@jreback jreback closed this as completed Nov 10, 2017
@jreback jreback added Apply Apply, Aggregate, Transform, Map Duplicate Report Duplicate issue or pull request labels Nov 10, 2017
@jreback jreback added this to the No action milestone Nov 10, 2017
@tdpetrou
Copy link
Contributor Author

@jreback Do the others cover the three possible outcomes? Its really bizarre behavior.

@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

@tdpetrou having lists as elements it the bizarre part. These are not in any way supported. Thus the apply behavior is really undefined. If you want to have a look, go right ahead. This is an edge case which is requires apply to basically guess at user intentions.

@jreback jreback modified the milestones: No action, 0.22.0 Nov 30, 2017
@jorisvandenbossche
Copy link
Member

FYI, this will be fixed in #18577

@tdpetrou
Copy link
Contributor Author

@jorisvandenbossche Personally, I would disallow any complex data structures to be an element in a pandas dataframe, especially if they are not supported

@jorisvandenbossche
Copy link
Member

You can comment on the PR if you want. But changing that would be a big backwards compatibility break (much bigger than the current PR).
And they are in some way supported, just discouraged.

jorisvandenbossche pushed a commit that referenced this issue Feb 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants