Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Series.cut / Expr.cut: Add number of bins as an alternative #16947

Closed
Julian-J-S opened this issue Jun 14, 2024 · 1 comment
Closed
Labels
enhancement New feature or an improvement of an existing feature

Comments

@Julian-J-S
Copy link
Contributor

Description

Pandas cut and qcut both allow to specify the "number of bins" (e.g. 4) as alternative to the actual bins as list (e.g. [2, 5, 10])

Idea

cut with number of bins N:

  • create N bins of equal width

qcut with number of bins N:

  • create N bins of equal probability

Pandas

Pandas qcut

pd.qcut([1, 1, 2, 2, 4 ,4, 9, 9], q=4, retbins=True)
# ([(0.999, 1.75], (0.999, 1.75], (1.75, 3.0], (1.75, 3.0], (3.0, 5.25], (3.0, 5.25], (5.25, 9.0], (5.25, 9.0]]
#  Categories (4, interval[float64, right]): [(0.999, 1.75] < (1.75, 3.0] < (3.0, 5.25] < (5.25, 9.0]],
#  array([1.  , 1.75, 3.  , 5.25, 9.  ]))

Pandas cut

pd.cut([1, 1, 2, 2, 4 ,4, 9, 9], bins=4, retbins=True)
# ([(0.992, 3.0], (0.992, 3.0], (0.992, 3.0], (0.992, 3.0], (3.0, 5.0], (3.0, 5.0], (7.0, 9.0], (7.0, 9.0]]
#  Categories (4, interval[float64, right]): [(0.992, 3.0] < (3.0, 5.0] < (5.0, 7.0] < (7.0, 9.0]],
#  array([0.992, 3.   , 5.   , 7.   , 9.   ]))

Polars

Polars qcut (almost equal except that polars uses "inf" bounds at the edges)

pl.Series([1, 1, 2, 2, 4, 4, 9 ,9]).qcut(4, include_breaks=True).struct.unnest()
shape: (8, 2)
┌─────────────┬──────────────┐
│ break_pointcategory     │
│ ------          │
│ f64cat          │
╞═════════════╪══════════════╡
│ 1.75        ┆ (-inf, 1.75] │
│ 1.75        ┆ (-inf, 1.75] │
│ 3.0         ┆ (1.75, 3]    │
│ 3.0         ┆ (1.75, 3]    │
│ 5.25        ┆ (3, 5.25]    │
│ 5.25        ┆ (3, 5.25]    │
│ inf         ┆ (5.25, inf]  │
│ inf         ┆ (5.25, inf]  │
└─────────────┴──────────────┘

Polars cut 💥

  • Option to specify number of bins is missing
@Julian-J-S Julian-J-S added the enhancement New feature or an improvement of an existing feature label Jun 14, 2024
@stinodego
Copy link
Member

Thanks for the input, but we are already planning to overhaul the API of cut and qcut. See #10468

I'll close this in favor of that one.

@stinodego stinodego closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants