Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: should apply also follow result_type for axis=0 ? #19570

Open
jorisvandenbossche opened this issue Feb 7, 2018 · 0 comments
Open

API: should apply also follow result_type for axis=0 ? #19570

jorisvandenbossche opened this issue Feb 7, 2018 · 0 comments
Labels
API Design Apply Apply, Aggregate, Transform, Map Enhancement Needs Discussion Requires discussion from core team before further action

Comments

@jorisvandenbossche
Copy link
Member

Follow-up issue on #18577

In that PR @jreback cleaned up the apply(..., axis=1) result shape inconsistencies, and we added a keyword to control this.

For example, when the applied function returns an array or a list, it now defaults to returning a Series of those objects, or expanding it to multiple columns if you pass result_type explicitly:

In [1]: df = pd.DataFrame(np.tile(np.arange(3), 4).reshape(4, -1) + 1, columns=['A', 'B', 'C'], index=pd.date_range("2012-01-01", periods=4))

In [2]: df
Out[2]: 
            A  B  C
2012-01-01  1  2  3
2012-01-02  1  2  3
2012-01-03  1  2  3
2012-01-04  1  2  3

In [3]: df.apply(lambda x: np.array([0, 1, 2]), axis=1)
Out[3]: 
2012-01-01    [0, 1, 2]
2012-01-02    [0, 1, 2]
2012-01-03    [0, 1, 2]
2012-01-04    [0, 1, 2]
Freq: D, dtype: object

In [4]: df.apply(lambda x: np.array([0, 1, 2]), axis=1, result_type='expand')
Out[4]: 
            0  1  2
2012-01-01  0  1  2
2012-01-02  0  1  2
2012-01-03  0  1  2
2012-01-04  0  1  2

In [5]: df.apply(lambda x: np.array([0, 1, 2]), axis=1, result_type='broadcast')
Out[5]: 
            A  B  C
2012-01-01  0  1  2
2012-01-02  0  1  2
2012-01-03  0  1  2
2012-01-04  0  1  2

However, for axis=0, the default, we don't yet follow the same rules / the keyword in all cases. Some examples:

  • For list, it depends on the length (and if the length matches, it preserves the original index instead of new range index):

    In [16]: df.apply(lambda x: [0, 1, 2, 3])
    Out[16]: 
                A  B  C
    2012-01-01  0  0  0
    2012-01-02  1  1  1
    2012-01-03  2  2  2
    2012-01-04  3  3  3
    
    In [17]: df.apply(lambda x: [0, 1, 2, 3, 4])
    Out[17]: 
    A    [0, 1, 2, 3, 4]
    B    [0, 1, 2, 3, 4]
    C    [0, 1, 2, 3, 4]
    dtype: object
    

    (result_type='expand' and result_type='broadcast' do work correctly here)

  • For an array, it expands when the length does not match (so different as for axis=1, and also different as for list):

    In [23]: df.apply(lambda x: np.array([0, 1, 2, 3]))
    Out[23]: 
                A  B  C
    2012-01-01  0  0  0
    2012-01-02  1  1  1
    2012-01-03  2  2  2
    2012-01-04  3  3  3
    
    In [24]: df.apply(lambda x: np.array([0, 1, 2, 3, 4]))
    Out[24]: 
       A  B  C
    0  0  0  0
    1  1  1  1
    2  2  2  2
    3  3  3  3
    4  4  4  4
    

So the question is: should we follow the same rules for axis=0 as for axis=1?
I would say: ideally yes. But doing so might break some behaviour (although it might be possible to do that with warnings).

@jorisvandenbossche jorisvandenbossche added API Design Apply Apply, Aggregate, Transform, Map labels Feb 7, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Feb 7, 2018
@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@mroeschke mroeschke added the Needs Discussion Requires discussion from core team before further action label Jun 18, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Apply Apply, Aggregate, Transform, Map Enhancement Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

3 participants