Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add ignore_index argument to DataFrame.explode / Series.explode #34932

Closed
erfannariman opened this issue Jun 22, 2020 · 10 comments · Fixed by #34933
Closed

ENH: add ignore_index argument to DataFrame.explode / Series.explode #34932

erfannariman opened this issue Jun 22, 2020 · 10 comments · Fixed by #34933
Assignees
Labels
Enhancement Needs Discussion Requires discussion from core team before further action Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@erfannariman
Copy link
Member

When we use DataFrame.explode right now, it will repeat the index for each element in the iterator. To keep it consistent with the methods like DataFrame.sort_values, DataFrame.append and pd.concat, we can add an argument ignore_index, which will reset the index.

df = pd.DataFrame({'id':range(0,30,10),
                   'values':[list('abc'), list('def'), list('ghi')]})
print(df)

   id     values
0   0  [a, b, c]
1  10  [d, e, f]
2  20  [g, h, i]

print(df.explode('values'))
   id values
0   0      a
0   0      b
0   0      c
1  10      d
1  10      e
1  10      f
2  20      g
2  20      h
2  20      i

Expected behaviour with addition of the argument:

df.explode('values', ignore_index=True)

   id values
0   0      a
1   0      b
2   0      c
3  10      d
4  10      e
5  10      f
6  20      g
7  20      h
8  20      i
@erfannariman erfannariman added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 22, 2020
@erfannariman
Copy link
Member Author

If this change looks oké by one of the devs, I can submit a PR for this.

@erfannariman
Copy link
Member Author

take

@TomAugspurger
Copy link
Contributor

I think something like this was discussed when explode was originally implemented. @erfannariman can you go through the original pull request implementing explode and summarize the discussion on this point?

@TomAugspurger TomAugspurger added Needs Discussion Requires discussion from core team before further action Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 22, 2020
@erfannariman
Copy link
Member Author

erfannariman commented Jun 22, 2020

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 22, 2020 via email

@jreback
Copy link
Contributor

jreback commented Jun 22, 2020

it’s ok to add this argument (was added elsewhere after explode existed)

@WillAyd
Copy link
Member

WillAyd commented Jun 22, 2020

What's the upside of adding this as an argument instead of just calling reset_index?

@erfannariman
Copy link
Member Author

Not sure if im in the position to comment on your question, but in terms of API design, isn't that in the line of other methods like DataFrame.append, DataFrame.sort_values, pd.concat? Or do you mean internally wise? @WillAyd

@WillAyd
Copy link
Member

WillAyd commented Jun 22, 2020

Ah OK makes sense since we do elsewhere

@jorisvandenbossche
Copy link
Member

For the ignore_index, another example where this was added recently is drop_duplicates (#30405) and sort_values (#30402).
And another reason to add it is that it can be a bit more performant (avoid an additional copy as you would have with reset_index(drop=True)).


One aspect related to the index of the result that was briefly discussed in the original PR (#27267 (review)) is whether to add a level to the index with a "count", thus resulting in a MultiIndex (which could eg be useful if you want to do an unstack in a next step).

I personally think that could still be useful, and we could potentially think about combining that in a single keyword. However, since ignore_index is already used in other places, probably better to consider this separately.

@erfannariman erfannariman changed the title ENH: add ignore_index argument to DataFrame.explode ENH: add ignore_index argument to DataFrame.explode / Series.explode Jun 22, 2020
@jreback jreback added this to the 1.1 milestone Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Discussion Requires discussion from core team before further action Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants